Author Topic: dual core vs quad core? (Read 4146 times)

krick · « **on:** March 05, 2015, 02:41:48 pm »

Since GroovyMAME uses two threads when multithreading is turned on, does it make sense to favor a quad-core CPU so that there's an available core for OS and I/O system level threads without interrupting GroovyMAME?

adder · « **Reply #1 on:** March 05, 2015, 03:31:17 pm »

something similar ive been wondering:
the cache size of a cpu - if cpu cache size (eg. 2mb cache vs. 6mb cache) makes much of a difference to mame performance?

Doozer · « **Reply #2 on:** March 06, 2015, 02:15:32 am »

I played a lot on this side. Enabling/Disabling HT/Core/Interrupt and routing processes in exclusive CPU execution context. The more GHz you can achieve is the key factor here. A single mono core processor is sufficient even if the groovyume executable have several thread. The most interesting thing is seen on a multi-core CPU with the following test. Tekken (almost all version) runs fine on a single core (1 CPU) but shows 50% performance when balanced between 2 cores. Discussions with @Calamity confirm that the switchres portion is part of the spawned thread. I have the feeling that the audio portion is also impacted by this.

To conclude; At the moment I am running all my configurations in mono CPU mode even if the system is multi CPU. I see few more Hz gained through the turbo mode (Intel CPU) allowance in reduced HT/CORE mode. It is true that nowadays CPU have sufficient computing power to not focus on this extra speed. They have enough power to run all pre 2000 games perfectly.

Calamity · « **Reply #3 on:** March 10, 2015, 06:48:23 am »

Quote from: krick on March 05, 2015, 02:41:48 pm

Since GroovyMAME uses two threads when multithreading is turned on, does it make sense to favor a quad-core CPU so that there's an available core for OS and I/O system level threads without interrupting GroovyMAME?

This is not a simple matter to answer I'm afraid, and without a strong empirical base I'm afraid we're a bit lost. Due to the different way things are implemented, the results from Linux (Doozer) might not to apply 100% to Windows.

When using SDL builds (Linux or Windows), multithreading is implemented through the osd_work functions. I have no idea of the way the different threads are arranged when using these functions.

When using the GroovyMAME for Windows builds, the threads are managed directly from my code (for the most part). First of all, GroovyMAME in multithreading mode uses three threads, not two:

- Thread 1: core emulation
- Thread 2: window proc
- Thread 3: renderer (wait for vsync happens here)

The idea here is that the window proc is always free to process input messages, no matter what's going on the other threads.

The fact that DirectX has traditionally been a thread-unsafe api has encouraged (well, actually forced) using the window thread for all calls to this api in order to avoid deadlocks. The problem with this approach starts when waiting for vsync is required, which keeps the window thread sit waiting most of its time slice. Because Windows priorizes the messages sent to a window by their importance, if the time left for the window to pump these messages gets reduced it may happen that input messages simply arrive too late. This is specially obvious with input devices such as mouses that literally flood the message pump loop.

That's why putting the renderer code in a separate thread and leaving the window proc alone seemed like a good idea to reduce input latency (and the lag tests seem to prove it).

Separating the core emulation from the renderer in two threads also made asynchronous rendering possible for an API (DirectX 9) that didn't support it natively. Basically, when syncrefresh is enabled, thread 1 and 3 are synchronized (thread 1 waits thread 3), but when triplebuffer is enabled both run asynchronously.

The problem with this implementation is that it makes the program very prone to deadlocks when focus is taken from us (alt-tab, accidental minimizing, uncivilized frontends messing with our process, etc.)

Nowadays apis do support asynchronous rendering natively. This means that the renderer thread is implemented internally by the api. The funny thing is that OpenGL, at least for Linux that is the system I've been able to test so far, *only* seems to support asynchronous rendering, making proper vsync impossible.

So, based on this ideally GroovyMAME for Windows would need three cores. But I don't mean this is the case because the way the system arranges the hardware resources is not that simple. And the core emulation itself also can make use of multiple cores if available, according to the devs.

Doozer · « **Reply #4 on:** March 10, 2015, 10:09:12 am »

You are right Calamity, using multiple threads `has given`/is giving headache to many developers. It is a false concept to assume better performances with multiple CPU/Core, especially if the synchronization and data transfer mechanisms tend to slow down and add complexity compare to a linear approach. The emulation process have to respect some basic principle to ensure proper video-audio-input handling aside with the emulation job. As you mentioned, this is highly OS driven. You have a good view on the windows context, I will try to apply it to the Linux side to see if something can materialize.

OK, let first try to identify the threads on Linux with mt option enabled.

Quote

groovymame(1)-+-{SDLAudioDev1}(2)
|-{SDLTimer}(3)
|-{groovymame}(4)
`-{groovymame}(5)

Thread 5 is the result of the mt option. My guesses (sorry did not confirmed this by reading the code) are the following:

1. core emulation
2. sdl audio thread
3. sdl watchdog
4. window proc
5. renderer

2/3/4/5 are created within SDL osd. I have already identified that thread 2 and thread 5 are link to emulation speed issues. I did not look how they yield to each other but I suspect synchronization/time hiccups in that area. I will try to have dedicated CPU instruction line for each of them and see how the execution behave.

My Linux SDL lag test on single core CPU and multicore CPU shows input directly on next frame with thunderx rom (0 frame delay). Comparison with MVS systems shows that no lag occurs. Enable/disabling of mt is transparent to the final rendering. Does it make sense to state that the mt option does not bring enhancement under SDL/Linux? (but necessary for DirectX?)

With respect to opengl sync/async, here is an extract of OpenGl insights

Quote

The specification defines two event reporting modes: synchronous and asynchro-
nous. The former will report the event before the function that caused the event
terminates. The latter option allows the driver to report the event at its conve-
nience. The reporting mode can be set by enabling or disabling GL DEBUG OUTPUT
SYNCHRONOUS ARB. The default is asynchronous mode.

In synchronous mode, the callback function is issued while the offending OpenGL
call is still in the call stack of the application. Hence, the simplest solution to find
the code location that caused the event to be generated is to run the application in a
debug runtime environment. This allows us to inspect the

In windows.c ASYNC_BLIT is set as extra flag. Do you know if FLAG_NEEDS_ASYNCBLIT (0x200) forces opengl to be asynchronous under Linux? It might be possible to use synchronous mode under Linux. What is your opinion?

Quote

SDL_ASYNCBLIT

Enables the use of asynchronous updates of the display surface. This will usually slow down blitting on single CPU machines, but may provide a speed increase on SMP systems.

Calamity · « **Reply #5 on:** March 12, 2015, 11:38:18 am »

Hi Doozer, thanks for the insight.

Quote from: Doozer on March 10, 2015, 10:09:12 am

My Linux SDL lag test on single core CPU and multicore CPU shows input directly on next frame with thunderx rom (0 frame delay).

I'm very interested in this. In my own tests I've never managed to get that in SDL (Linux), the best I get is 2-3 frames of lag iirc. Other users confirm this. Did you get any footage of this?

Quote

Enable/disabling of mt is transparent to the final rendering. Does it make sense to state that the mt option does not bring enhancement under SDL/Linux? (but necessary for DirectX?)

Well probably that's the case. It doesn't bring any enhancement for Windows either with baseline, but for a minor performance gain when running unthrottled which is quite irrelevant. It does mean an enhancenment the way it's implemented in the Groovy patch. The devs that did the SDL osd just ported the multithreading option there for completeness but the implementation is different from the one in Windows. Most devs are for deprecating the option altogether anyway.

Regarding the OpenGL issue, I'll try to find some links I read at the time I ported the patch to SDL2.

Doozer · « **Reply #6 on:** March 13, 2015, 04:06:59 am »

Quote from: Calamity on March 12, 2015, 11:38:18 am

Quote from: Doozer on March 10, 2015, 10:09:12 am
My Linux SDL lag test on single core CPU and multicore CPU shows input directly on next frame with thunderx rom (0 frame delay).

I'm very interested in this. In my own tests I've never managed to get that in SDL (Linux), the best I get is 2-3 frames of lag iirc. Other users confirm this. Did you get any footage of this?

I used the pause and single frame advance method to check how the system reacts to input. I know that a video (60fps+) is more suitable but I did not manage to find time to put a led to the control panel and do the test. I did look around for a test procedure but never find a description to stick to.

I am assuming that the delay observed by people could come from the input device (which is purely hypothetical). I have built my HID controller on an AVR and managed to acquire/debouncing and processing the key between each poll.

If you have know a procedure I can follow, I can focus and report on this `lag` issue.

cools · « **Reply #7 on:** March 13, 2015, 07:37:24 am »

That test only checks how the game handles input lag itself, I get the same result with thunderx and numerous other games.

The video method checking live input delay (with a zero frame delay game) is the important check.

Calamity · « **Reply #8 on:** March 13, 2015, 08:15:09 am »

Yeah by doing the step by a single frame test you're actually in the macroscopic world where input lag doesn't exist.

Quote from: Doozer on March 13, 2015, 04:06:59 am

If you have know a procedure I can follow, I can focus and report on this `lag` issue.

Well it's quite a simple concept but a bit cumbersome to put in practice. You need to wire a 5V LED to the leg of one of the microswitches of your control panel, in such a way that it lights up when a button is pressed. Then you record the game play with a 120 fps camera preferrably so the LED is captured in the video along the monitor and you can count the number of frames it takes for the character to react, as explained in this thread:

http://forum.arcadecontrols.com/index.php/topic,133194.0.html

Doozer · « **Reply #9 on:** March 13, 2015, 08:23:30 am »

I will do a recording with a led to see what's the lag here. I will post the result in the "Input Lag - MAME iteration comparisons vs. the real thing?" thread.

[EDIT] Test done, 2 frames delay


Main	Restorations	Software	Audio/Jukebox/MP3	Everything Else	Buy/Sell/Trade
Project Announcements	Monitor/Video	GroovyMAME	Merit/JVL Touchscreen	Meet Up	Retail Vendors
Driving & Racing	Woodworking	Software Support Forums	Consoles	Project Arcade	Reviews
Automated Projects	Artwork	Frontend Support Forums	Pinball	Forum Discussion	Old Boards
Raspberry Pi & Dev Board	controls.dat	Linux	Miscellaneous Arcade	Wiki Discussion	Old Archives
Lightguns	Arcade1Up	Try the site in https mode		Site News


Unread posts \| New Replies \| Recent posts \| Rules \| Chatroom \| Wiki \| File Repository \| RSS \| Submit news

Author Topic: dual core vs quad core? (Read 4146 times)

krick

dual core vs quad core?

adder

Re: dual core vs quad core?

Doozer

Re: dual core vs quad core?

Calamity

Re: dual core vs quad core?

Doozer

Re: dual core vs quad core?

Calamity

Re: dual core vs quad core?

Doozer

Re: dual core vs quad core?

cools

Re: dual core vs quad core?

Calamity

Re: dual core vs quad core?

Doozer

Re: dual core vs quad core?