The big culprit in the case of Blitz/Gauntlet Legends is the video hardware. Most of the old games that you can emulate in real time use simple sprite/tile based video architectures. This is pretty straightforward to implement on a general purpose CPU and modern PCs are so fast that they can keep up.
Blitz/Gauntlet Legends (the hardware is very similar) use a 3dfx Voodoo (yes, the same thing you could have bought in ~1996 for your PC). 3d hardware is a different beast altogether, and there's a reason modern PCs now include 3d accelleration capabilities. In a nutshell, for every clock, a 3d device can complete several more operations, within the narrow set of operations it is designed to perform, than your CPU can. Your CPU can perform about 2-4 operations per clock, but a graphics processor can perform on the order of 8-256, depending on the device. Emulating this on a CPU is entirely possible, but even if you completely discount any "translation" that has to be performed, it'll run 2-128 times slower than a GPU clocked at the same speed. GPU operations are also specifically tailored to graphics usage and are in most cases SIMD type operations (they operate in a single instruction on more than one set of completely independent data).
The graphics device in Blitz is clocked at (IIRC) 200MHz and for some reason I'm thinking the Voodoo pipeline is 8 wide (i.e. it can complete 8 ops every clock), so you'd need a "1.6GHz" CPU (or so, using very rough examples here) just to complete operations as fast as the Voodoo on Blitz does, plus you need to translate Voodoo instructions (which are very graphics specific) into CPU instructions (which are not), and this inevitibly requires more than one CPU instruction for every GPU instruction. THEN, you also need to emulate the CPU (again, often more than one CPU instruction for every GPU instruction, though the MIPS arch used by Blitz and Gauntlet Legends is pretty straightforward), AND you need to emulate the sound DSP (which again can complete about 4-8 instructions per clock, usually), AND you need to handle mapping the controls (which are very easy to talk to on the arcade hardware) to PC controls (which aren't nearly as easy to talk to, in comparison). PLUS you have a behemoth of an OS running in the background, while the arcade hardware does not.
Now, all this is exacerbated by the fact that MAME is not multi-threaded! Modern CPUs are getting faster because they are more than one CPU in a package. MAME is emulating 2-4 pieces of discrete hardware (or more, in some cases) that all ran on their own clocks, completing instructions all in parallel, but MAME only uses one of your processors (if you have more than one), so your ONE processor has to do the job of ALL the hardware that was on the original board, and your ONE processor isn't even particularly good at most of it.
I guess I do feel it necessary to point out that comparing wildly different architectures using clockspeed as the chief measure of comparison is a horrible thing to do, but with appropriate context it can at least give you an idea of what's going on. Take all numbers with an ocean's worth of salt, but the general principle is what matters.