The logic that led to the -frame_delay option was discussed here:
http://forum.arcadecontrols.com/index.php/topic,128993.msg1313217.htmlInitially, -frame_delay was implemented to theoretically help removing 1 frame of lag by "delaying" the emulation of the next frame as much as possible, ideally right before or during the vblank period. This would allow to get the most up-to-date input state before going into the emulation itself. The "technical" problem involved is that it would require a cpu that's fast enough to emulate each frame at a fraction of the time that the original hardware did. A crude estimation was done that the cpu should be able to keep the speed above 1250% steadly when running unthrottled for the intended effect to be achieved. Then it was suggested to implement this option with gradual steps from 1 to 9, where 1 stands 10% of a frame period, and 9 stands for 90%, so one could adapt it to get the longer possible delay with the current hardware.
So theoretically, using -frame_delay 1 would be almost equal to not using -frame_delay at all, just 10% more chances of catching the input in time for the next frame.
BUT, surprisingly, the way -frame_delay was implemented (when used with ATI cards and d3d) has an unexpected side effect: it bypasses a frame queue that the drivers secretly arrange when vsync is used. It turns out this frame queue adds a lag of 2-3 frames by itself, so bypassing it has a massive effect in input responsiveness, way more important that the one that was originally intended by the -frame_delay implementation.
That's why by simply enabling -frame_delay with 1, most of the perceived input lag disappears. The extra gain that would come from raising from 1 to 9 only translates to the last remaining frame of lag, and only statistically (e.g. it may help with 33% of the total frames), and probably may only be detected by recording a high speed video, and only on already highly optimized systems.