Things are simple. Forget by now about the built-in pcb latency most games have (2 frames).
When you enable vsync on your emulator, AMD/Nvidia drivers create an infamous frame queue of several frames. This is true even without triplebuffer, which obviously makes things even worse (caution: GM's triplebuffer doesn't add latency).
Now, take stock MAME, or GM, they're mostly the same. And run it fullscreen with -waitvsync, and WITHOUT -frame_delay. You'll be experiencing 2-3 frames of lag due to frame queue (buffering), in addition to the pcb latency. GM might be slightly better (1 frame) due to the way the inputs are polled but no big deal: the bulk of the latency is in the driver's hidden frame queue.
Then, run GM adding -frame_delay 5. As a side effect of current -frame_delay implementation, the frame queue is bypassed, and you end up with 0-1 frame latencies. Notice the purpose of -frame_delay is NOT bypassing the frame queue, it is a (desired) side effect. Probably I'll split -frame_delay in two options, one to just bypass the frame queue and another one to actually do frame delay, so to avoid uneducated use of the frame_delay feature.