According to the documentation on the mame site, the multithreading in mame splits the emulation engine in one thread and the video rendering into another.
On a dual core system that means that one core is entirely dedicated to emulating the logic of the game, and the other core is fully dedicated to shunting the resulting video data onto the screen. (Though in most cases on modern systems it's unlikely that both cores would be 100% saturated by theses tasks). Now, beyond this there is still other stuff going on: Windows is handling the input from your controller, windows is managing the routing of sound via DirectX, there might be network traffic going on, possibly a virus scanner, etc. In a purely dual core setup, all this stuff still has to schedule itself between the execution of the emulator. When you move past two cores you're increasing the chances that there will be free processing capacity at the instance that mame requires it. The additional cores don't go "unused", they just aren't being dedicated to mame. Mame isn't the only thing happening though.
On the subject of AMD vs Intel, at the moment current Intel processors seem to be more time efficient running mame, as was mentioned by another poster. What that means is that on two random systems, one AMD, the other Intel, running at identical clock speeds the Intel system is likely to be faster. If some cpu instruction takes 7 clock cycles to complete on some AMD CPU, and the same instruction takes 5 cycles on an Intel, the Intel chip would be getting ~28% more work done in the same period of time as the AMD system. (These numbers are purely for illustration).