Since GroovyMAME uses two threads when multithreading is turned on, does it make sense to favor a quad-core CPU so that there's an available core for OS and I/O system level threads without interrupting GroovyMAME?
This is not a simple matter to answer I'm afraid, and without a strong empirical base I'm afraid we're a bit lost. Due to the different way things are implemented, the results from Linux (Doozer) might not to apply 100% to Windows.
When using SDL builds (Linux or Windows), multithreading is implemented through the osd_work functions. I have no idea of the way the different threads are arranged when using these functions.
When using the GroovyMAME for Windows builds, the threads are managed directly from my code (for the most part). First of all, GroovyMAME in multithreading mode uses three threads, not two:
- Thread 1: core emulation
- Thread 2: window proc
- Thread 3: renderer (wait for vsync happens here)
The idea here is that the window proc is always free to process input messages, no matter what's going on the other threads.
The fact that DirectX has traditionally been a thread-unsafe api has encouraged (well, actually forced) using the window thread for all calls to this api in order to avoid deadlocks. The problem with this approach starts when waiting for vsync is required, which keeps the window thread sit waiting most of its time slice. Because Windows priorizes the messages sent to a window by their importance, if the time left for the window to pump these messages gets reduced it may happen that input messages simply arrive too late. This is specially obvious with input devices such as mouses that literally flood the message pump loop.
That's why putting the renderer code in a separate thread and leaving the window proc alone seemed like a good idea to reduce input latency (and the lag tests seem to prove it).
Separating the core emulation from the renderer in two threads also made asynchronous rendering possible for an API (DirectX 9) that didn't support it natively. Basically, when syncrefresh is enabled, thread 1 and 3 are synchronized (thread 1 waits thread 3), but when triplebuffer is enabled both run asynchronously.
The problem with this implementation is that it makes the program very prone to deadlocks when focus is taken from us (alt-tab, accidental minimizing, uncivilized frontends messing with our process, etc.)
Nowadays apis do support asynchronous rendering natively. This means that the renderer thread is implemented internally by the api. The funny thing is that OpenGL, at least for Linux that is the system I've been able to test so far, *only* seems to support asynchronous rendering, making proper vsync impossible.
So, based on this ideally GroovyMAME for Windows would need three cores. But I don't mean this is the case because the way the system arranges the hardware resources is not that simple. And the core emulation itself also can make use of multiple cores if available, according to the devs.