Has there ever been a documented reverse engineering of an arcade game?

Main > Everything Else

<< < (3/8) > >>

MonMotha:
Reverse engineering even relatively simple software (say, a page of C) to the degree you speak of is mind-numbingly boring and generally has to be done by somebody who is knowledgeable enough to have something much more interesting to be working on. I've done similar things on key sections of programs to do things like emulate an IO board, and it can take weeks just to fully reverse engineer maybe 5-10 functions.

It's not necessary, anyway. You can fully emulate the system without ever even looking at the program (source or compiled binary). You can make small patches to the program with a disassembler, hex editor, and maybe a debugger (this is how all the "infinite lives" stuff is done).

That said, a lot of the source code is still out there. I know there are people that still have the source to some mid-90s Midway games, for example. I suspect most companies that are still alive have the source laying around somewhere, and for companies that have been liquidated, somebody probably ended up with the source. Whether they bothered to keep it or not is up to them, of course. Many of those buyers are "IP warehouses". They only care about enforcing the copyrights they've purchased, not the artistic value (ya know, the stuff that "promote(s) the Progress of Science and useful Arts". I'm sure plenty of source has been lost to the ages, but nobody would have ever been under obligation to release it, anyway.

ChadTower:

Didn't they decompile Pacman back in the day in order to document the ghost behaviors? I'm pretty sure I remember reading that was done via a real decompile and not just basic observations.

jimmy2x2x:

--- Quote from: Howard_Casto on March 02, 2012, 04:54:01 pm ---I'm no expert on the history of arcade games or anything, but I can give you the short answer.... no.

When source is compiled, especially on arcade machines, it gets compiled to some form of assembly. All comments are removed. Heck, even the names of the variables are removed! It saves space that way. Once you de-compile, you can reverse engineer and make some new comments based on how you think things are working (see mame) but once it's gone, it's gone. Mame rarely does that much actually. Think of the game's program chips as the harddrive of a computer. Mame doesn't emulate the harddrive... it emulates the entire pc, which can play the contents of the harddrive. So it isn't necessary to understand completely what is going on in the program roms, so long as the emulated hardware is setup correctly.

If any source code still exists to some arcade games, it would be rare if the public ever gets to see it. The games are still copyrighted afterall.

That isn't to say that there might be a few cases out there, but in general no.... you aren't going to find any source code.

--- End quote ---

I really dont think most arcade games would be coded in any language that needed to be compiled, assembled seems a lot more likely to me.

The content of the roms would be mainly data in most cases, leaving a relatively small code area.. I just thought it might be interesting to have a look at how some of the classics where coded, how they actually did things internally. Due to the nature of proms, variable tracking should be a bit easier than some other systems.

I am surprised there isn't any leaked source code for anything after all these years

Did anyone else remember a story about Taito losing the source for Bubble Bobble?

jimmy2x2x:

--- Quote from: ChadTower on March 02, 2012, 05:06:01 pm ---
Didn't they decompile Pacman back in the day in order to document the ghost behaviors? I'm pretty sure I remember reading that was done via a real decompile and not just basic observations.

--- End quote ---

Thats more like it! Pretty sure it would have been done at some point.

MonMotha:
The early stuff was probably written in assembly. However, when you run it through the assembler, you still lose all the variable names, label names, function names, etc. Imagine trying to read a program where the everything was just named "x", "y", "z", etc. and all the jump points were just labeled "addr0", "addr1", "addr2", etc. It's surprisingly difficult to figure out wtf is going on. Add in the fact that the programmers are free to pick wild calling conventions and even vary them between functions, and it can be a mess to read. Usually the only saving grace is that there's just not much code, and the assembly output will (due to the way assembly works) be identical in structure and flow to the input, which tends to leave some human-friendly idioms intact that compilers often destroy.

Some REALLY early stuff was probably written directly in "machine code". This is basically the same as assembly, but rather than being run through an assembler, that translation is done by hand and entered using a hex editor, card punch, etc. Even then, most programs would have had lots of documentation that essentially comprises the "source". It's essentially impossible to maintain programs written directly in "machine code", even if you're the person who wrote it just a few hours ago!

I'd guess that by the mid-late 80s, most stuff was being written in C. Compilers were readily available for all the common CPUs (68k, z80, etc.), and they were reasonably good. You could run them on a PC and get output that you could dump to your parallel port connected EPROM programmer for testing. The longest part of the dev cycle at that point was waiting for a batch of ROMs to erase (takes about 20m - most companies would have about 10-20 full sets of ROMs for each developer so they could do rapid cycle development with about 2-5min cycle times, then "batch erase" 10-15 sets while continuing development on the remaining stuff).

Even before then, I'd have expected to see at least some FORTRAN. You might also see some PASCAL starting in the very late 80s.

If you've ever looked at the output of even a non-optimizing C compiler, it can be a bit tough to follow. In addition to having no variable names, label names, function names, etc., the compiler also has to translate the common flow control structures (for loops, while loops, do-while loops, if statements, switch/case statements, etc.) into compare and branch instructions. Essentially, all loops become do-while (which is one of the least used kinds, when writing C) and if statements become a hodgepodge of goto (which are avoided because they're hard to follow and maintain). The only real saving graces are that the calling convention is consistent, and the basic program flow stays roughly in-order. Optimizing compilers remove that last nicety (for the reverse engineer) and can produce output that's actually quite difficult to follow as it drastically differs from how one would normally write the input C code or even how one would write it by hand in assembly.

Reverse engineering just one section of the program is quite reasonable, though. Someone who's good at it can probably do it in a couple days, but it'll depend on complexity and the size of the "section". "Decompilers" produce very rudimentary output. It's often better to just pore over the disassembly and start scratching down psuedocode in a text editor.

And yeah, people lose source all the time. A lot of old mega-corporate and bank applications either have no known source available, despite at one point being written in COBOL or similar, or the binary running on the system has been patched directly so many times that the source is out of date and really only useful as a vague reference (knowing that the program that's actually running and that you need to potentially work on may have been altered from it). You might be surprised how much source is still floating around on some old floppy, though, and how often somebody knows where and what it is.

Navigation

 Message Index

Go to full version