Also, keep in mind that those benchmarks were done with the "-video none" parameter which means that it was only CPUs and NO VIDEO EMULATION AT ALL. Therefore, those are NOT the numbers you would see in game while playing.
Right conclusion, wrong reasons.
Run the games yourself and see the "differences". On a system very like John's "test5", I get very little difference between no video and d3d and ddraw in blitz:
test#1 test#2 test#3 test#4 video
option28.95% 28.49% 27.92% 28.45% none
28.63% 27.60% 27.12% 27.78% d3d
28.34% 27.25% 26.61% 27.52% ddraw
21.21% 20.37% 20.05% 20.53% gdi
tests 1 & 2: mame -str 90 -nosound -nothrottle -video
option blitz
tests 3 & 4: mame -str 90 -sound -nothrottle -video
option blitz
My test system: AMD athlon64 3500+ (2.21 GHz), Radeon 9550, 1 gig mem
The sample noise (+- 2%) is about the same as the change between d3d, ddraw, & no video, and between sound & no sound. The only conclusion that can be made is that software video (aka "gdi") is a lot slower. D3d vs no video are about 2% difference, within the sample noise.
(And yes I'd like to run the tests at least one more time per sound/nosound, but that's ~30 minutes for one round of the 4 options for an hour more; I shoulda picked a faster game.

)
OTOH, as you said, the numbers do not reflect what I'd get while playing the game. It's (mostly) because the 90 seconds tested includes bootup and other easier to emulate moments, and only some (demo) game play, while playing would be that harder to emulate game play.
Example: mid 60%s for the first 10 emulated seconds (aka bootup) in the fast three video options, 45% for gdi for my system.