Hi joeblade,
I'm glad that it worked finally. I think I know what's happening with the -frame_delay option. You probably need to define a greater value for it, like 4 or 5, instead of 1 (e.g. -frame_delay 5). When defining a low value like 1, you certainly get rid of the lag associated to Direct3D's flip queue, but you don't get any further benefit, and what's worse, you risk entering twice in the same vertical retrace, specially with fast computers and low demanding games: I'd say this is the cause of the wrong speed you're seeing for Galaga, etc.
BTW, the lag with v-synced Direct3D (without -frame_delay) has been confirmed to be 3 frames, so it's a serious issue indeed.