I've done a fair amount of importing XML into a program called FileMaker (think MS Access). I spent a couple of hours working on a XML tranform and was able to parse the list of games and their roms. I got far enough to prove that it could work. Does anyone know if the XML schema is documented somewhere? The general structure is listed at the top but not the number of times each element must or can appear.
Well that's the thing right there....
For most of the stuff in the xml, the minimum amount is 0 and the maximum amount is infinate (of course there is a practical limit, but you get my point). The only two real exceptions being the rom name and description. A couple of things are fixed and required, like the driver status and the controls info, but for the most part everything is optional and there can be multiple entries.
Myself I've always ignored the schema. It isn't particularly helpful due to that fact.
If I were going to make a generic database and wanted to be able to search ANYTHING, here is how I would handle it......
A line like: <game name="xfilesp" sourcefile="whitestar.c" ismechanical="yes">
Would get divided into columns like so:
game:name|game:sourcefile|game:ismechanical
-------------------------------------------------------
xfilesp | whitestar.c | yes
See? You generically combine the node name (game) with the variable setting (name, sourcefile...ect) and get the unique column name.
For node values that don't have variables you simply use the node name so adding to our entry:
<description>X-Files (3.03)</description>
<year>1997</year>
<manufacturer>Sega</manufacturer>
would translate into the columns:
description | year | manufacturer
X-Files (3.03) | 1997 | Sega
The only tricky bit would be duplicate node names.... you would have to keep a list of the node names you've already printed for each entry and rename as needed.
Like the rom entries:
<rom name="xfcpu.303" size="131072" crc="c7ab5efe" sha1="dcb4b89adfb5ba39e59c1118a00b29941d3ea4e9" region="user1" offset="0"/>
<rom name="xfildspa.300" size="524288" crc="03c96af8" sha1="06a26116f863bb9b2d127e18c5ba500537923d62" region="dmdcpu" offset="0"/>
The first set of columns you add would be rom:name, rom:size ect... BUT upon finding a new rom node you would probably want to rename the new one to rom2:name, rom2:size ect....
Or at least that's how I would do it. Basically parse the xml as it's loaded, one node at a time.... deal with the results generically if at all possible.