For the 3rd method I'm assuming you're doing an "API hijack" of the DirectX dll's by attaching a dll to the MAME process and passing calls through your own code?
I believe MAME uses LoadLibrary for this so I assume it would be the first call you hijack? If so that's quite a task to do that for all the different types of emulators and different versions of DirectX. I'm guessing Libretro is an attempt to standardize an interface so this type of thing is easier to achieve?
That's the easiest way, along with the CreateRemoteThread class of functions -- run as a "process parasite" (read the latest issue of phrack for that) and you can even memcpy over parts of the target's renderingcode with your own position independent code. I never ported the hijack stuff to windows, just fooled around to see which tricks actually worked against WoW Warden and friends. Heck, even writing the hijack part as a buffer overflow exploit worked surprisingly well ;p
The libretro approach is cleaner, and covers a decent enough set of emulators already and that's why I won't put in the hours to port and test the hijack- lib for windows, I don't use that thing outside VMWare or Wine ever. What the libretro guys have done is defining a decent enough API for graphics/audio output, input devices and rom-loading. Then they've added patches for a whole bunch of emulators (think they're up to 20 or so emulators by now) so that they use this API instead of whatever system-specific stuff that was in place, and made sure that these compile on a whole bunch of platforms (wii, xbox, xbox360, ps3, linux, osx, ...).
I've done a similar thing for hijacking RawInput (which is what MAME uses for keyboard input) but I believe it would be alot more work to do this for Direct3D as you have to create dummy functions for all the API calls and pass them onto the real dll.
Nine times out of ten, the emulators interactions with the system are so simple that there's only a handful of functions you'll ever need to redirect and know or care about, the rest you can leave in place. There are however open projects that have already done this for quickly reversing game engines, grabbing / replacing textures / shaders etc.
It also requires that you attach a 32 or 64 bit dll based on the bitness of the target process. I'm guessing this is what you're referring to when you say "lots of caveats and details to account for".
Patching the threading library to stop others from interfering while you do your dirty work might be necessary, and, if you're poking around in commercial stuff -- telling whatever DRM junk there is to look the other way for a few seconds. Also, it's one thing getting a hundred or so I/O events into a process -- grabbing 70-100MB/s within strict deadlines (50 or 60Hz, evenly distributed with as low a mean deviation from the ideal as possible etc.) takes you on a deeper tour into performance-tuning land :-)
I'm also wondering if the benefits are worth it? How often do you need to change input or rendering settings for and emulator? Most people set them once and that's it.
Depends on what you want to do. Look at the
http://sourceforge.net/p/arcanfe/wiki/Roadmap%2C%20Future%20Changes%2C%20Nifty%20Ideas/ I'm not particularly interested in the "omg pirate romz" sortof thing, there's hyperspin for that. I'm way more curious about weird, creative and non-obvious ways of using emulators (or well, any full-virtualization, but that's a different story) but to get there, I need a good and stable baseline.
Things like global keyboard hooks work fine for blocking and injecting keyboard input for most emulators and without the need to attach to its process but what is the use of that unless you know the default keys?
Write a static translation table between namespaces (heck, there's a ---smurfy--- 5-minute script hidden in there somewhere that does this for arcan<->mame, far from accurate with every usecase, but enough to cover the basics), query the user, write an odd little USB device driver -- there's quite a few options ;-) IDT hooking and friends tend to raise certain alarms, but that's what you get in CreateProcessEx hell ;-p