It also helps to understand how an emulator works from a programming perspective. The following is a gross oversimplification and not a perfect analogy.
Let's think of it this way-- I am an American who speaks a particular dialect of English. In order to communicate with someone who speaks another language I need to translate both ways. An emulator works kind of like this. I want to emulate the ability to speak this foreign language. Let's call it Japanese. So a question is asked of me. First I have to understand the syntax of the sentence, then I have to look up every single word in a dictionary and figure out what that means in English. In a CPU, we kind kind of think of the words as 'op codes', the lowest level of instructions a processor can do. If I'm issued an opcode, or 'word' that means 'add A and B', first I have to look up what that means (which takes time), then I have to actually do the operation (which takes time), then i have to translate the result back into whatever the original processor expected (which also takes time). So suddenly where adding A and B took place in a single clock cycle on the original system, I'm suddenly taking 30 or 40 cycles to complete the same task in my emulator because of all the overhead of converting. With that sort of speed trade off, the original CPU that ran at 30Mhz suddenly needs 900Mhz - 1.2Ghz just to keep up, and that's assuming I've got 100% of the processors power.
So now what about variations in dialect? What if "add a to b, then c to a, then a to b" in my foreign language assumes that I know when I add a sequence of three like that I'm just suppose to know I need to also copy A to D. These are the kind of little oddities that make emulation not work. Things that the emulator's author didn't know about the original system, or that weren't clearly documented. Suddenly when I'm asked to add D and C, I have to stop because I didn't even know there was a D. There's been a mistranslation and misunderstanding.
couldn't have explained it better myself.

the processors run in arcade machines are very specialized. the code of the game is optimized to that specific processor and it's instruction set. the cpu in your desktop computer speaks a completely different "language" with it's own instruction set. therefore, the single simple command issued by the game program has to be translated into a very complex code sequence that your desktop CPU can understand YET still accomplishes the same thing.
just as an example, highly specialized game code can tell the processor in about 4 commands to draw a square on the screen and fill it in.
ready for a command mr. processor?
send "draw 1,1-8,14-1-13,45,C3" to processor (draw in the top left corner a square that starts at 1,1 and goes to 8,14 then fill it with color 13,45,C3)
done
highly simplified but you get it...
to accomplish the same thing with a desktop may take 100's of commands, including, but not limited to:
1 accessing the videocard bus
2 waiting until the bus is available
3 finding an available address for the information (repeat 2 and 3 as many times as needed)
4 waiting until the bus is available AGAIN
5 finding the correct memory address for the information
6 waiting until the bus is available AGAIN
7 writing the information for the position of the first corner of the square to the address
8 waiting until the bus is available
9 finding an available address for the information (repeat 7 and 8 as many times as needed)
10 waiting until the bus is available AGAIN
11 finding the correct memory address for the information
12 waiting until the bus is available AGAIN
13 writing the information for the position of the second corner of the square to the address
as you see we have now mapped 2 corners of the square, and haven't even connected the points yet or even filled it yet. we are already looking at 10x the commands for my simplified explanation.
as you see, it's not that the computers are not fast enough or the game companies are super coders, they just speak different languages. it's the translation that takes forever.