I'm seeing the same issue with nothrottle + multithreading in the Windows version, crazy CPU usage when nothrottle included, so it definitely seems a problem with Mame design. I've been looking through the code and have some idea of how to solve it rather easily, however I'm not sure if that will break something else. On the other side patching Mame is discouraging as it may turn out to be useless with next version, etc.
So for what I'm seeing now with multithreading they're doing two threads:
- Window thread: that deals with window managment and ddraw/direct3d (where vsync happens)
- Main thread: the emulator itself, that creates a frame at a time
By the way, this is the desing that should be the default for Mame, as single threaded emulators are crap as the sound gets stuck in a loop when they are minimized.
So the main thread creates a frame, and if throttle is enabled, then runs update_throttle(current_time) in \emu\video.c, which calls throttle_until_ticks, which at the end calls osd_sleep(delta) where the 'sleep' is actually done. So if you disable throttle, the sleep of the thread is not performed, that's why it's eating all cpu cycles.
// if we're throttling, synchronize before rendering
attotime current_time = timer_get_time(&m_machine);
if (!debug && !skipped_it && effective_throttle())
update_throttle(current_time);
// ask the OSD to update
g_profiler.start(PROFILER_BLIT);
m_machine.osd().update(!debug && skipped_it);
g_profiler.stop();
The waiting for vsync, however, is performed in the d3d/ddraw part. So if we have a single thread, the processor will wait there and Mame will be synchronized despite disabling throttle, although wasting all cpu time.
But when we start both threads, then the main thread is not aware of the window thread waiting! (as they run in paralell) and keeps sending frames to it all the time regardless vsync, unless we enable throttle.
So what should be done is to create an event object in order to synchronize both threads. This is done (in Windows) with the CreateEvent api. So, after creating a frame the main thread would do a WaitForSingleObject to release the cpu until we order it to go on with the next one. In the window thread, we would wait for vsync, and when done, we would use SetEvent, to instruct the other thread to go on. In theory this would make a better use of cpu.
So the throttle logic should be replaced when doing vsync, from the current sleep for some ticks (that is causing scroll hiccups for me), to the event ruled method above.