I'm no expert programmer, generally learning as I go...
The DOM approach lets you essentially treat the XML as a database, (it loads the entire tree to memory, and you can search for any value or node) but is essentially useless for any reasonably sized XML file. (Don't even think about it for mame).
What I did is do a single pass of the XML file using the SAX parser, but I flagged up all the info I was interested in - romname, full game name, romof, cloneof, and every joystick and button, - for every game. All of this info is passed into simple arrays in memory during the parse. Everything else in the xml is ignored. The entire process takes well under 2 seconds for me. This gives me arrays so that if rom(10) is 1942, then romname(10), full game name(10), romof(10), cloneof(10),joystick(10)..... gives all the info associated with 1942.
I then go through the users actual roms on disk and pull the info from the arrays as needed. So if I find 1942.zip in the rom folder, I'll search the rom() array for '1942' to find its position in the array - get a return of '10' - and I have instant access to all the info I need for 1942. (You can build a Hash table of the rom() for searching - makes it enormously quick to search).
This is a bit simplistic, and its not that far removed from txt scanning the xml line by line and pulling all the info you need, but the SAX parser (from MSXML I used - yes a dependancy - but I think there are lots of other SAX parsers out there) is basically doing the same thing and returning data for nodes you specify in advance....
(In my program I SAX parse mame.xml and then controlsdat.xml to pull all the data into memory arrays, then search controlsdat to get accurate data, then search mame.xml if controlsdat has no info - remember to check for clones in controlsDat of course...
)