Main Restorations Software Audio/Jukebox/MP3 Everything Else Buy/Sell/Trade
Project Announcements Monitor/Video GroovyMAME Merit/JVL Touchscreen Meet Up Retail Vendors
Driving & Racing Woodworking Software Support Forums Consoles Project Arcade Reviews
Automated Projects Artwork Frontend Support Forums Pinball Forum Discussion Old Boards
Raspberry Pi & Dev Board controls.dat Linux Miscellaneous Arcade Wiki Discussion Old Archives
Lightguns Arcade1Up Try the site in https mode Site News

Unread posts | New Replies | Recent posts | Rules | Chatroom | Wiki | File Repository | RSS | Submit news

  

Author Topic: dual core vs quad core?  (Read 4148 times)

0 Members and 1 Guest are viewing this topic.

krick

  • Trade Count: (+1)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 2006
  • Last login:May 23, 2025, 03:48:36 am
  • Gotta have blue hair.
dual core vs quad core?
« on: March 05, 2015, 02:41:48 pm »
Since GroovyMAME uses two threads when multithreading is turned on, does it make sense to favor a quad-core CPU so that there's an available core for OS and I/O system level threads without interrupting GroovyMAME?
Hantarex Polo 15KHz
Sapphire Radeon HD 7750 2GB (GCN)
GroovyMAME 0.197.017h_d3d9ex
CRT Emudriver & CRT Tools 2.0 beta 13 (Crimson 16.2.1 for GCN cards)
Windows 7 Home Premium 64-bit
Intel Core i7-4790K @ 4.8GHz
ASUS Z87M-PLUS Motherboard

adder

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 640
  • Last login:February 04, 2021, 10:51:51 am
  • Location: Easy St.
Re: dual core vs quad core?
« Reply #1 on: March 05, 2015, 03:31:17 pm »
something similar ive been wondering:
 the cache size of a cpu -   if cpu cache size (eg. 2mb cache vs. 6mb cache) makes much of a difference to mame performance?

Doozer

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 498
  • Last login:June 12, 2023, 09:19:49 am
  • Z80 ERROR
Re: dual core vs quad core?
« Reply #2 on: March 06, 2015, 02:15:32 am »

I played a lot on this side. Enabling/Disabling HT/Core/Interrupt and routing processes in exclusive CPU execution context. The more GHz you can achieve is the key factor here. A single mono core processor is sufficient even if the groovyume executable have several thread. The most interesting thing is seen on a multi-core CPU with the following test. Tekken (almost all version) runs fine on a single core (1 CPU) but shows 50% performance when balanced between 2 cores. Discussions with @Calamity confirm that the switchres portion is part of the spawned thread. I have the feeling that the audio portion is also impacted by this.

To conclude; At the moment I am running all my configurations in mono CPU mode even if the system is multi CPU. I see few more Hz gained through the turbo mode (Intel CPU) allowance in reduced HT/CORE mode. It is true that nowadays CPU have sufficient computing power to not focus on this extra speed. They have enough power to run all pre 2000 games perfectly.

Calamity

  • Moderator
  • Trade Count: (0)
  • Full Member
  • *****
  • Offline Offline
  • Posts: 7463
  • Last login:July 01, 2025, 01:29:14 pm
  • Quote me with care
Re: dual core vs quad core?
« Reply #3 on: March 10, 2015, 06:48:23 am »
Since GroovyMAME uses two threads when multithreading is turned on, does it make sense to favor a quad-core CPU so that there's an available core for OS and I/O system level threads without interrupting GroovyMAME?

This is not a simple matter to answer I'm afraid, and without a strong empirical base I'm afraid we're a bit lost. Due to the different way things are implemented, the results from Linux (Doozer) might not to apply 100% to Windows.

When using SDL builds (Linux or Windows), multithreading is implemented through the osd_work functions. I have no idea of the way the different threads are arranged when using these functions.

When using the GroovyMAME for Windows builds, the threads are managed directly from my code (for the most part). First of all, GroovyMAME in multithreading mode uses three threads, not two:

- Thread 1: core emulation
- Thread 2: window proc
- Thread 3: renderer (wait for vsync happens here)

The idea here is that the window proc is always free to process input messages, no matter what's going on the other threads.

The fact that DirectX has traditionally been a thread-unsafe api has encouraged (well, actually forced) using the window thread for all calls to this api in order to avoid deadlocks. The problem with this approach starts when waiting for vsync is required, which keeps the window thread sit waiting most of its time slice. Because Windows priorizes the messages sent to a window by their importance, if the time left for the window to pump these messages gets reduced it may happen that input messages simply arrive too late. This is specially obvious with input devices such as mouses that literally flood the message pump loop.

That's why putting the renderer code in a separate thread and leaving the window proc alone seemed like a good idea to reduce input latency (and the lag tests seem to prove it).

Separating the core emulation from the renderer in two threads also made asynchronous rendering possible for an API (DirectX 9) that didn't support it natively. Basically, when syncrefresh is enabled, thread 1 and 3 are synchronized (thread 1 waits thread 3), but when triplebuffer is enabled both run asynchronously.

The problem with this implementation is that it makes the program very prone to deadlocks when focus is taken from us (alt-tab, accidental minimizing, uncivilized frontends messing with our process, etc.)

Nowadays apis do support asynchronous rendering natively. This means that the renderer thread is implemented internally by the api. The funny thing is that OpenGL, at least for Linux that is the system I've been able to test so far, *only* seems to support asynchronous rendering, making proper vsync impossible.

So, based on this ideally GroovyMAME for Windows would need three cores. But I don't mean this is the case because the way the system arranges the hardware resources is not that simple. And the core emulation itself also can make use of multiple cores if available, according to the devs.
Important note: posts reporting GM issues without a log will be IGNORED.
Steps to create a log:
 - From command line, run: groovymame.exe -v romname >romname.txt
 - Attach resulting romname.txt file to your post, instead of pasting it.

CRT Emudriver, VMMaker & Arcade OSD downloads, documentation and discussion:  Eiusdemmodi

Doozer

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 498
  • Last login:June 12, 2023, 09:19:49 am
  • Z80 ERROR
Re: dual core vs quad core?
« Reply #4 on: March 10, 2015, 10:09:12 am »

You are right Calamity, using multiple threads `has given`/is giving headache to many developers. It is a false concept to assume better performances with multiple CPU/Core, especially if the synchronization and data transfer mechanisms tend to slow down and add complexity compare to a linear approach. The emulation process have to respect some basic principle to ensure proper video-audio-input handling aside with the emulation job. As you mentioned, this is highly OS driven. You have a good view on the windows context, I will try to apply it to the Linux side to see if something can materialize.

OK, let first try to identify the threads on Linux with mt option enabled.

Quote
groovymame(1)-+-{SDLAudioDev1}(2)
                 |-{SDLTimer}(3)
                 |-{groovymame}(4)
                 `-{groovymame}(5) 

Thread 5 is the result of the mt option. My guesses (sorry did not confirmed this by reading the code) are the following:

1. core emulation
2. sdl audio thread
3. sdl watchdog
4. window proc
5. renderer

2/3/4/5 are created within SDL osd. I have already identified that thread 2 and thread 5 are link to emulation speed issues. I did not look how they yield to each other but I suspect synchronization/time hiccups in that area. I will try to have dedicated CPU instruction line for each of them and see how the execution behave.

My Linux SDL lag test on single core CPU and multicore CPU shows input directly on next frame with thunderx rom (0 frame delay). Comparison with MVS systems shows that no lag occurs. Enable/disabling of mt is transparent to the final rendering. Does it make sense to state that the mt option does not bring enhancement under SDL/Linux? (but necessary for DirectX?)

With respect to opengl sync/async, here is an extract of OpenGl insights

Quote
The specification defines two event reporting modes: synchronous and asynchro-
nous. The former will report the event before the function that caused the event
terminates. The latter option allows the driver to report the event at its conve-
nience. The reporting mode can be set by enabling or disabling GL DEBUG OUTPUT
SYNCHRONOUS ARB. The default is asynchronous mode.

In synchronous mode, the callback function is issued while the offending OpenGL
call is still in the call stack of the application. Hence, the simplest solution to find
the code location that caused the event to be generated is to run the application in a
debug runtime environment. This allows us to inspect the

In windows.c  ASYNC_BLIT is set as extra flag. Do you know if FLAG_NEEDS_ASYNCBLIT (0x200) forces opengl to be asynchronous under Linux? It might be possible to use synchronous mode under Linux. What is your opinion?

Quote
SDL_ASYNCBLIT
   
Enables the use of asynchronous updates of the display surface. This will usually slow down blitting on single CPU machines, but may provide a speed increase on SMP systems.

Calamity

  • Moderator
  • Trade Count: (0)
  • Full Member
  • *****
  • Offline Offline
  • Posts: 7463
  • Last login:July 01, 2025, 01:29:14 pm
  • Quote me with care
Re: dual core vs quad core?
« Reply #5 on: March 12, 2015, 11:38:18 am »
Hi Doozer, thanks for the insight.

My Linux SDL lag test on single core CPU and multicore CPU shows input directly on next frame with thunderx rom (0 frame delay).

I'm very interested in this. In my own tests I've never managed to get that in SDL (Linux), the best I get is 2-3 frames of lag iirc. Other users confirm this. Did you get any footage of this?

Quote
Enable/disabling of mt is transparent to the final rendering. Does it make sense to state that the mt option does not bring enhancement under SDL/Linux? (but necessary for DirectX?)

Well probably that's the case. It doesn't bring any enhancement for Windows either with baseline, but for a minor performance gain when running unthrottled which is quite irrelevant. It does mean an enhancenment the way it's implemented in the Groovy patch. The devs that did the SDL osd just ported the multithreading option there for completeness but the implementation is different from the one in Windows. Most devs are for deprecating the option altogether anyway.

Regarding the OpenGL issue, I'll try to find some links I read at the time I ported the patch to SDL2.
Important note: posts reporting GM issues without a log will be IGNORED.
Steps to create a log:
 - From command line, run: groovymame.exe -v romname >romname.txt
 - Attach resulting romname.txt file to your post, instead of pasting it.

CRT Emudriver, VMMaker & Arcade OSD downloads, documentation and discussion:  Eiusdemmodi

Doozer

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 498
  • Last login:June 12, 2023, 09:19:49 am
  • Z80 ERROR
Re: dual core vs quad core?
« Reply #6 on: March 13, 2015, 04:06:59 am »
My Linux SDL lag test on single core CPU and multicore CPU shows input directly on next frame with thunderx rom (0 frame delay).

I'm very interested in this. In my own tests I've never managed to get that in SDL (Linux), the best I get is 2-3 frames of lag iirc. Other users confirm this. Did you get any footage of this?

I used the pause and single frame advance method to check how the system reacts to input. I know that a video (60fps+) is more suitable but I did not manage to find time to put a led to the control panel and do the test. I did look around for a test procedure but never find a description to stick to.

I am assuming that the delay observed by people could come from the input device (which is purely hypothetical). I have built my HID controller on an AVR and managed to acquire/debouncing and processing the key between each poll.

If you have know a procedure I can follow, I can focus and report on this `lag` issue.
« Last Edit: March 13, 2015, 06:59:38 am by Doozer »

cools

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 645
  • Last login:May 17, 2025, 02:24:48 pm
  • Arcade Otaku Sysadmin
    • Arcade Otaku
Re: dual core vs quad core?
« Reply #7 on: March 13, 2015, 07:37:24 am »
That test only checks how the game handles input lag itself, I get the same result with thunderx and numerous other games.

The video method checking live input delay (with a zero frame delay game) is the important check.

Calamity

  • Moderator
  • Trade Count: (0)
  • Full Member
  • *****
  • Offline Offline
  • Posts: 7463
  • Last login:July 01, 2025, 01:29:14 pm
  • Quote me with care
Re: dual core vs quad core?
« Reply #8 on: March 13, 2015, 08:15:09 am »
Yeah by doing the step by a single frame test you're actually in the macroscopic world where input lag doesn't exist.

If you have know a procedure I can follow, I can focus and report on this `lag` issue.

Well it's quite a simple concept but a bit cumbersome to put in practice. You need to wire a 5V LED to the leg of one of the microswitches of your control panel, in such a way that it lights up when a button is pressed. Then you record the game play with a 120 fps camera preferrably so the LED is captured in the video along the monitor and you can count the number of frames it takes for the character to react, as explained in this thread:

http://forum.arcadecontrols.com/index.php/topic,133194.0.html

Important note: posts reporting GM issues without a log will be IGNORED.
Steps to create a log:
 - From command line, run: groovymame.exe -v romname >romname.txt
 - Attach resulting romname.txt file to your post, instead of pasting it.

CRT Emudriver, VMMaker & Arcade OSD downloads, documentation and discussion:  Eiusdemmodi

Doozer

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 498
  • Last login:June 12, 2023, 09:19:49 am
  • Z80 ERROR
Re: dual core vs quad core?
« Reply #9 on: March 13, 2015, 08:23:30 am »
I will do a recording with a led to see what's the lag here. I will post the result in the "Input Lag - MAME iteration comparisons vs. the real thing?" thread.

[EDIT] Test done, 2 frames delay
« Last Edit: March 13, 2015, 02:01:08 pm by Doozer »