There's the game program's normal delay resulting of the code as its programmers made it running on a specific hardware, it is what the driver emulates (supposedly most of the MAME drivers with few exceptions are accurate in that aspect)
Then specific to the emulator (and its settings) are inputs polling and buffering frames for vsync video output, both can add-up several frames in total to the game that is being emulated, so the total lag chain is A+B+C (game+polling+videosync. not sure if audio can play a role here or its delay is a separate issue)
GroovyMAME takes care of B and C, not A which remains untouched for obvious accuracy purposes, if you want to reduce the game's own inherent lag you have to use a feature like run-ahead in RetroArch, or hacked drivers in ShmupMAME.
(though with those depending on the settings related to B and C, you may or may not obtain an actually lower lag chain, most times users have to sacrifice vsync to effectively achieve that. Groovy can't touch A but it can preserve vsync while eliminating B+C)
Anyway in GM frame delay can almost completely eliminate B and C, so ideally the only delay left is that of the original game.
In straightforward cases, level 1 leaves 1 frame on top of A, and the higher you move framedelay, the less often that last frame will actually be a thing in your way, in other words increasing framedelay increases occurences of that last undesirable frame not being present when you input: therefore no lag left, it's like playing the pcb.
In practice though the higher framedelay level is not always the ideal to achieve this because we don't know when the emulated game's program itself polls, a too high framedelay can even be counter-productive.
In my experience the best results are often found somewhere 6~7~8, while 9 is rarely achievable anyway in particular with resource-intensive games.
As for variables in lag readings, I think they can depend on the reading method, the general settings of Groovy and of course framedelay level, but also I believe it can vary between games even if they're running from a same driver, either because the code is slightly different, or that it varies within the game at different moments/places (though I don't imagine by more than 1 frame, in comparison some modern games today can show variations of several frames during gameplay, ew). You could also think of the hardware factor, like controller pcbs or OS-software related, etc.
EDIT: oops didn't see Calamity posted before me. Well.
PS: and yes it's good to remind the pause + frame count method only shows the lag from A (or A+B ? in any case it's no indication of how much lag there actually is in total)