Ok, interesting results... when I switched to d3d v8 I got a message saying "Unable to initialize Direct3d" - so I set it back to 9. When I ran the benchmark, I got about 115% consistantly. Then I realized I had forgot to turn on the multithreading. Turned that on and re-ran the benchmarks and got an average of about 450%. Should there be THAT much of a difference?
Multiple threading puts the directX stuff (direct3D, directdraw, directInput, and maybe directsound) on the second thread. If the bottleneck is in one of those, and it's sharing one core with the rest of mame (say 25% of the time), moving the bottle neck to a second, unused core/hyperthread, so the bottleneck gets to use ~100% of the core, it's possible.
Also if the bottleneck is in the video, and your video card is doing it in software not hardware (very possible with geforce2 mx & d3d 9)), then this
could be about the same speed increase of getting a new card that does it on hardware. Maybe. I'm surprised at the d3d 8 caused that error, though.

115-130% is boardline, as that's the average, and you notice anything below 100%. Whether a new video card will help enough even after -multiplethreading is enabled in your case I can't be sure. It's possible, but not guarantees.