I thought the bottleneck for many of the CHD's was spindle speed. Am I wrong?
Not sure I quite follow - spindle speed is a physical aspect of a hard drive. The spindle speed of the cpu you run mame on is irrelevant. I don't know how CHD's are read by mame but I suspect that loading the image directly as it does is almost always faster than the original game could supply data off there own HDs.
The issue in speed is of emulation of the system. The fact that CHD games are often talked about as slow is because they tend to be newer games (they had HD's after all) and therefore tend to be based on newer and faster hardware, hence take more power to emulate.
In addition, I wasn't aware that mame is single-threaded. It looks like it's developed in some C variant so I'd wonder if it's a strict limitation of the ROMs that forces the mamedevs to only spawn a single thread
What a program is written in has almost no bearing on whether it is single or multithreaded. The original systems design/roms are completely irrelevant.
Writing multithreaded applications is very hard. There are also exceptionly few programmers that have a lot of experience writing multithreaded applications (multiple processors have never threatened to be mainstream until recent multicore processors). People tend to imagine that complex emulated games could be sped up by sticking 1 emulated cpu on one processor and emulating the other on the 2nd cpu/core. The problem is that these emulated cpus have to communicate on an extremely exact timing schedule that takes an *unbelieveable* amount of tough programming to achieve across seperate cpus. So much so that it has been suggested that even if all the effort was gone to, the overhead all the extra code to keep the emulated cpus in sync would probably cancel out all the speed advantage, taking you back to square one, except with a program thats probably twice as hard to debug, and probably buggier.
At most you can offload some non-emu related tasks, like drawing the bitmaps to the screen etc..., to a 2nd cpu, but you would only gain a tiny amount. The best advantage is that all OS related tasks could be kept on one core, allowing mame exclusive access to the 2nd core. I'm sure this could help by 5%(?anyone?). So sadly thats the best to hope for, at least until programmers spend a few years really getting to grips with multicore, or someone releases a highly optimised auto-matic multithreading compilier. (I don't see that happening).