The NEW Build Your Own Arcade Controls

Software Support => GroovyMAME => Topic started by: jdubs on July 01, 2013, 12:28:58 pm

Title: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jdubs on July 01, 2013, 12:28:58 pm
Guys

Minimizing input lag is HUGE (as many already know) especially when it comes to very time sensitive game types like shmups and fighters.  The "tricks" used by ShmupMAME has garnered it the "unofficial" non-actual-hardware choice method of playing the game Super Street Fighter II Turbo.  In fact, some work had been done to draw direct comparisons to actual hardware:

http://forums.shoryuken.com/discussion/178437/official-shmupmame-super-turbo-thread/p1 (http://forums.shoryuken.com/discussion/178437/official-shmupmame-super-turbo-thread/p1)

Has anything like this ever been done with GroovyMAME?  From an overall library availability perspective, GroovyMAME is hugely preferable.  I'm just VERY curious how close to actual hardware GroovyMAME get's.  I'm already playing it on a CRT with:


hlsl_enable              0   (I run 480p into my CRT but use a miniSLG to create scanlines - looks terrific this way).
triplebuffer               0
waitvsync                 0
frame_delay             1

To try and do everything I can to minimize input lag....but I don't have the necessary measurement equipment to tell me how close I'm getting to the "real thing". 

Would love to hear any input on this topic.

Thanks,
Jim
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: cools on July 01, 2013, 04:46:18 pm
If you're using Groovy properly then triplebuffer and waitvsync are being controlled internally ;)

That thread seems to be more about emulation speed accuracy than input lag. I'm not sure they're quite the same thing, but I'm not about to download 100MB videos to check.

Framedelay 1 makes a massive difference in my setup, effectively making everything I test compared to the PCB indistinguishable in a blind test.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jdubs on July 01, 2013, 10:09:35 pm
Actually, scroll down...he first tests the overall speed of the emulation then looks specifically at lag.  Its an all inclusive measurement, meaning that it account for display lag, emulation lag, and usb lag.  It is REALLY close to the hardware.

I would LOVE to hear that GroovyMAME can get this close (or closer)...I just don't have the measurement hardware to test it out myself.

-Jim
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jdubs on July 02, 2013, 11:46:07 am
Also, any experimentation done with changing the USB polling rate?

-Jim
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on July 06, 2013, 12:50:47 pm
Hi jdubs,

I'd be very interested too in seeing some real latency tests done for GroovyMAME. I wrote long ago in the shmups forum to invite people testing it with no success.

Bear in mind GroovyMAME doesn't implement any trickery (i.e. removing sprite buffers), it just tries to reduce or eliminate the lag associated to v-sync in normal MAME, so in the best case it would have the same lag that non-vsynced normal MAME has. On this regard, it's pointless to disable v-sync and report lag values based on that, as GroovyMAME is *supposed* to run v-synced, so that it's visually indistiguishable from the real thing.

As soon as I have the time, I'm going to run the tests myself. I have a new camera that can record at 240 fps. I'll add a frame number prompt on screen, and use the method of mapping a button to Caps Lock so the keyboard leds flashing are recorded together (as proposed by DaRayu's at mameworld's forum).

The problem is, I don't have the real hardware to compare with, so I'll need to test those games that we have reliable measurements based on real hardware.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jdubs on July 08, 2013, 03:38:41 pm
Doing the measurements as you describe would be awesome, Calamity!!  Hopefully we can pull some real-hardware data together, as a community, for valid comparisons.

This will be huge for everyone!

Thanks!!
Jim



Hi jdubs,

I'd be very interested too in seeing some real latency tests done for GroovyMAME. I wrote long ago in the shmups forum to invite people testing it with no success.

Bear in mind GroovyMAME doesn't implement any trickery (i.e. removing sprite buffers), it just tries to reduce or eliminate the lag associated to v-sync in normal MAME, so in the best case it would have the same lag that non-vsynced normal MAME has. On this regard, it's pointless to disable v-sync and report lag values based on that, as GroovyMAME is *supposed* to run v-synced, so that it's visually indistiguishable from the real thing.

As soon as I have the time, I'm going to run the tests myself. I have a new camera that can record at 240 fps. I'll add a frame number prompt on screen, and use the method of mapping a button to Caps Lock so the keyboard leds flashing are recorded together (as proposed by DaRayu's at mameworld's forum).

The problem is, I don't have the real hardware to compare with, so I'll need to test those games that we have reliable measurements based on real hardware.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jdubs on July 08, 2013, 06:28:56 pm
Calamity, you might also want to take a look at the link I posted in the fist post in this thread.  papasi at the Shoryuken forums did some actual hardware lag tests / measurements for a CPS2 board (Super Street Fighter II Turbo).

-Jim
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on July 09, 2013, 06:25:39 am
Calamity, you might also want to take a look at the link I posted in the fist post in this thread.  papasi at the Shoryuken forums did some actual hardware lag tests / measurements for a CPS2 board (Super Street Fighter II Turbo).

-Jim


Yeah I had read those links, I meant to test the same game (Super Street Fighter II Turbo) in GroovyMAME, as the lag in the actual board is a known value. However I still don't get how he got those values, from the videos he just seems to make the character jump, but I don't see how that relates to the moment the button is pressed.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jdubs on July 09, 2013, 02:21:57 pm
I think the concept is using a universal gaming stick (working with PC plus the actual game) with an LED "tied" to the applicable button(s).  Using a camera with the LED and the screen in view, you can time how long it takes for the button press to translate into character movement on-screen.

Pretty good way of doing it assuming you've got a fast enough camera. 

-Jim



Calamity, you might also want to take a look at the link I posted in the fist post in this thread.  papasi at the Shoryuken forums did some actual hardware lag tests / measurements for a CPS2 board (Super Street Fighter II Turbo).

-Jim


Yeah I had read those links, I meant to test the same game (Super Street Fighter II Turbo) in GroovyMAME, as the lag in the actual board is a known value. However I still don't get how he got those values, from the videos he just seems to make the character jump, but I don't see how that relates to the moment the button is pressed.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Ex_mosquito on July 17, 2013, 07:05:20 am
I am MEGA curious about the Groovymame input lagg. To me it's indestiquisgable from the actual PCB, on Shinobi, R-Type and Daimakaimura at least (I have these pcb's). I'd love to try and help. I have an AstroCity with GroovyMame+CRT_Emudrivers through a Jpac, and I also have a few PCBs. The only problem is I'm not sure how I would wire up a light to a button using the PCB, I could use the Caps button light like Calamity suggested for the emulation side. Anothing problem is I only have a camera that does 30fps. Would this be at least acceptable to get a rough analysis? I'm guessing you'd need 60fps minimum?

Also one more thing that I'm confused about. What is the 'frame_delay' option? Is this turned on by default in GroovyMame or would I need I generate an .ini and enable it?

Cheers Guys.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on July 18, 2013, 01:23:00 pm
Also one more thing that I'm confused about. What is the 'frame_delay' option? Is this turned on by default in GroovyMame or would I need I generate an .ini and enable it?

It's the latter. You may want to read up a bit in the other thread "Re: Successfully reducing MAME input lag via 120Hz monitor (applies to LCD and CRT)". Specifically with regards to the practical use of -frame_delay I've posted some info halfway down this post: http://forum.arcadecontrols.com/index.php/topic,133327.msg1374143.html#msg1374143 (http://forum.arcadecontrols.com/index.php/topic,133327.msg1374143.html#msg1374143)
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on July 21, 2013, 12:01:37 pm
So yesterday I run some tests to see how GroovyMAME performs as compared to the results on the real hardware, here: http://forums.shoryuken.com/discussion/178437/official-shmupmame-super-turbo-thread/p1 (http://forums.shoryuken.com/discussion/178437/official-shmupmame-)

First of all, by watching the videos done by papasi, I notice he is counting one frame less than I count (I'm referring to his videos). He says that SSFII Turbo, when played on the supergun, lags 4 frames natively, then I count them on his video and I get 5 frames. I'm guessing it must be due to the counting method: I start to count when the red led turns off, that frame is #1, then 2, 3, 4, and on #5 the character starts moving. Please correct me if you think I'm wrong.

The number of frames on my test videos has been counted following this rule, using Media Player Classic, CTRL+Right to advance frame by frame.

One difference is that I'm recording at 120 fps while papasi recorded his videos at 60 fps. So to get the number of frames on my videos you need to divide by too. I take 5 values and get the average.

The other difference is that the led in papasi's system is directly wired to the button, so in theory it's lag-less, but in my tests I'm using the keyboard leds, mapping the button to the caps-lock key in MAME (as suggested by DaRayu in MAME World). The problem with my setup is that, according to the old-school knowledge, it's the bios the one that turns these leds on/off, so the keypress signal must travel from the keyboard to the computer, be processed by the bios, then travel back to the keyboard where the led is finally lit. Obviously, this involves time, and judging by the slow motion video I took it takes at least 1 full frame of extra lag (5 frames at 240 fps). This means the led turns on/off one frame late, so you need to add 1 frame to the results from my videos.

So, due to this, my setup is suboptimal for this kind of tests. I really need to figure out how to wire a led to the buttons without short-circuiting my jpac, then I'll be able to get more accurate results.

The system that I used:
Pentium 4 3.0 GHz
Windows XP-64
ATI HD 4350
Catalyst 9.3 (CRT Emudriver)

Here are the videos:
http://www.2shared.com/file/jVoGNfgz/d3d_120fps.html (http://www.2shared.com/file/jVoGNfgz/d3d_120fps.html)
http://www.2shared.com/file/bgDinjGs/ddraw_120fps.html (http://www.2shared.com/file/bgDinjGs/ddraw_120fps.html)
http://www.2shared.com/video/FdRDSCvk/keyboard_led_lag_240fps.html (http://www.2shared.com/video/FdRDSCvk/keyboard_led_lag_240fps.html)

And here are the results and how it would compare to the real hardware:

d3d_novsync: 9, 6, 9, 7, 8 -> 7.8 -> 3.9 + 1(led-lag) = 4.9 ≈ 5 (no lag)
d3d_vsync: 16, 16, 15, 17, 17 -> 16.2 -> 8.1 + 1(led-lag) = 9.1 ≈ 9 (4 frames of lag)
d3d_framedelay: 10, 11, 10, 9, 10 -> 10 -> 5.0 + 1(led-lag) = 6.0 ≈ 6 (1 frame of lag)

ddraw_novsync: 8, 9, 9, 5, 8 -> 7.8 -> 3.9 + 1(led-lag) = 4.9 ≈ 5 (no lag)
ddraw_vsync: 11, 10, 12, 9, 8 -> 10 -> 5.0 + 1(led-lag) = 6.0 ≈ 6 (1 frame of lag)
ddraw_framedelay: 11, 9, 9, 10, 10 -> 4.9 + 1(led-lag) = 5.9 ≈ 6 (1 frame of lag)

The huge lag of d3d_vsync (4 frames) confirms DaRayu results and what we've been discussing here about the flip queue. This will need to be tested in W7 too, with newer versions of the drivers. So the only reasonable way of using d3d is by enabling the -frame_delay option: it removes 3 frames of lag, although, ironically, not because of the reason it was conceived for.

So, if these tests are right, we would have 1 frame of lag (as compared to the real hardware) for the properly working vsync cases (d3d_framedelay, ddraw_vsync, ddraw_framedelay). The frame_delay option doesn't seem to be making any fundamental difference here, although probably more accurate tests will be required to confirm this.

I believe the non-vsynced modes just show less lag because the changes are shown immediately without waiting for a full frame to be displayed. This is a little difficult to judge because the videos are not vsynced with the screen, so you get tearing either way.

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: machyavel on July 22, 2013, 01:05:32 pm
Hello All,

This is very interesting discussion and very informative but what about GM running in linux?
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jdubs on July 22, 2013, 03:01:48 pm
Calamity, thanks much for taking the time to do the testing!!  Very interesting results, indeed....

I think you're counting methodology makes sense.

-Jim
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on July 28, 2013, 05:40:50 pm
Yesterday I figured out how to connect a LED to my arcade controls, it's been much easier than I had imagined. Just connecting a 5V LED between the button cable and the microswitch leg where the cable normally is, and the LED will flash instantly as soon as you press the button.

This has confirmed my previous measurements, when I added an extra frame to the raw results from the video to account for the keyboard led lag, it seems I was doing the right thing. In particular, the d3d + frame_delay case, which is my standard test case, now results in 6 frames counted (1.25 frames of lag as compared to the supergun).

The interesting part is that I have found why the frame_delay feature was not working properly for the purpose it was designed. It seems the input poll was being done before it was supposed to, so the potential benefit was not happening. I have made an experimental build fixing this, and the good news are these new videos seem to confirm a real input lag reduction of 0.6 frames (average). With the new implementation, MAME would be on average only 0.65 frames laggier than the supergun, and that's probably the best it can be. I have added a frame counter on the screen so it makes it easier to identify each frame.

Regarding Linux, I have done a similar test with GroovyArcade, although using the regular GroovyMAME Linux build (the frame_delay option was not enabled). Unfortunately this test seems to confirm that SDL page flippling adds 3 frames of lag, just like Direct3D. However, we still don't have a workaround for this as we have in Windows, so until this is figured out maybe it turns out that Windows with all these hacks applied is a better platform by now.

Here are the videos:  http://www.2shared.com/file/l2AWR4jk/input_lag_test_d3d_sdl_120fps.html (http://www.2shared.com/file/l2AWR4jk/input_lag_test_d3d_sdl_120fps.html)

supergun (measured from this video: http://www.2shared.com/video/DXXoI0di/supergun_us_turbo2_CRT.html (http://www.2shared.com/video/DXXoI0di/supergun_us_turbo2_CRT.html))
5, 4, 5, 5, 5, 4, 5, 6, 4, 4, 5, 5 -> 4.75

d3d_frame_delay_old
12, 12, 12, 13, 13, 12, 11, 11, 11, 13 -> 12 -> 6 (1.25 frames of lag)

d3d_frame_delay_new
10, 8, 11, 11, 11, 11, 10, 12, 11, 10, 12, 10, 12, 10, 13 -> 10.8 -> 5.4 (0.65 frames of lag)

sdl_linux
14, 15, 16, 17, 17, 15, 17, 16, 15, 18, 15, 17, 15, 16, 16 -> 15.93 -> 7.97 -> (3.22 frames of lag)


This time the system used for testing was:

- Core2Duo
- Windows XP 64
- CRT Emudriver 6.5
- ATI 9250
- JPAC
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: cools on July 29, 2013, 04:58:08 am
Excellent work! Look forward to testing the next release, see if I can notice any difference.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on July 29, 2013, 05:47:40 am
Hi Calamity,

The interesting part is that I have found why the frame_delay feature was not working properly for the purpose it was designed. It seems the input poll was being done before it was supposed to, so the potential benefit was not happening. I have made an experimental build fixing this, and the good news are these new videos seem to confirm a real input lag reduction of 0.6 frames (average).

This is awesome news! Big thanks for doing these very interesting tests and persevering in getting the frame_delay feature right. Hope you don't keep us waiting too long for the update!

Quote
With the new implementation, MAME would be on average only 0.65 frames laggier than the supergun, and that's probably the best it can be.

We're not there yet ;) I see two possibilities that might provide even greater reduction of the input delay:

1. Increase the USB sample rate from the default 125Hz to 1000Hz.
This will change the average delay from 8ms to 1ms, shaving of ~0.5 frame of delay.
It's possible in both Windows XP and Windows 7, see this link for the downloads: http://www.ngohq.com/news/15043-how-to-increase-usb-sample-rate-in-windows-vista-7-a.html. (http://www.ngohq.com/news/15043-how-to-increase-usb-sample-rate-in-windows-vista-7-a.html.)

Please note that the link is explaining it for Windows 7, but the patch for Windows XP is also included. For XP you apparently only need the hidusbf.zip, third download on that page, or directly from here: http://www.ngohq.com/attachments/news/1954d1243462515-how-to-increase-usb-sample-rate-in-windows-vista-7-hidusbf.zip. (http://www.ngohq.com/attachments/news/1954d1243462515-how-to-increase-usb-sample-rate-in-windows-vista-7-hidusbf.zip.) There's a "README.ENG.TXT" inside the package explaining the install for XP.

2. Replace DirectInput with the RAWINPUT api for joysticks.
To quote from the Microsoft site, http://msdn.microsoft.com/en-us/library/ee418864.aspx#WM_INPUT (http://msdn.microsoft.com/en-us/library/ee418864.aspx#WM_INPUT) (see directly below the "DirectInput" header, about halfway down the page):
Quote
Internally, DirectInput creates a second thread to read WM_INPUT data, and using the DirectInput APIs will add more overhead than simply reading WM_INPUT directly.

There are no hard facts on how large that overhead reduction is, but based on experience it's substantial enough to notice. Note that this isn't easy to implement, but it has been done succesfully by Toni Wilen in WinUAE, completely replacing DirectInput for Joysticks and Gamepads. https://github.com/tonioni/WinUAE/blob/master/od-win32/dinput.cpp (https://github.com/tonioni/WinUAE/blob/master/od-win32/dinput.cpp)

Given your earlier testing results and the improved frame_delay feature, applying the two options above could possibly mean that GM's input delay can be reduced to *zero*. That is quite an exciting prospect! :P


Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jdubs on July 29, 2013, 02:40:48 pm
Calamity, wow, thank you!!!  This is huge!

Now I have ZERO reason to even think twice about a different MAME!

Dr. Venom, you raise a couple of interesting ideas.  I did the sample rate change to 1000hz a couple weeks back and it seems to have improved thing.  Calamity, any desire to further your testing with the USB sample rate and RAWINPUT tweaks?  I'm very curious how close we can really get to zero lag!!

Actually, Dr. Venom, how does one implement the RAWINPUT tweak?  I went to the link but just see a bunch of code.

-Jim
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on July 29, 2013, 03:27:51 pm
Actually, Dr. Venom, how does one implement the RAWINPUT tweak?  I went to the link but just see a bunch of code.

Jim,

RAWINPUT is not a "tweak" but an API for user input devices. So it needs to be incorporated by the programmer (that would be Calamity in this case ;) ) before end-users can make use of it. The link is showing the code for the WinUAE implementation, purely as an example of how it's been implemented for another emulator, as the implementation is not without quirks.

Hopefully Calamity is interested in exploring this further for GM at some time.  Official documentation from Microsoft is here: http://msdn.microsoft.com/en-us/library/windows/desktop/ms645536%28v=vs.85%29.aspx (http://msdn.microsoft.com/en-us/library/windows/desktop/ms645536%28v=vs.85%29.aspx)
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jdubs on July 29, 2013, 04:11:43 pm
Got it, thank you, sir!

-Jim


Actually, Dr. Venom, how does one implement the RAWINPUT tweak?  I went to the link but just see a bunch of code.

Jim,

RAWINPUT is not a "tweak" but an API for user input devices. So it needs to be incorporated by the programmer (that would be Calamity in this case ;) ) before end-users can make use of it. The link is showing the code for the WinUAE implementation, purely as an example of how it's been implemented for another emulator, as the implementation is not without quirks.

Hopefully Calamity is interested in exploring this further for GM at some time.  Official documentation from Microsoft is here: http://msdn.microsoft.com/en-us/library/windows/desktop/ms645536%28v=vs.85%29.aspx (http://msdn.microsoft.com/en-us/library/windows/desktop/ms645536%28v=vs.85%29.aspx)
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on July 29, 2013, 04:18:19 pm
Actually MAME already uses raw input since version v0.117u1: http://mamedev.org/updates/whatsnew_0117u1.txt (http://mamedev.org/updates/whatsnew_0117u1.txt)

Quote
  * Changed the Windows implementation of input handling to fully support the raw input interfaces for keyboard and mouse. DirectInput is still used for all joystick inputs, as well as for keyboard and mouse inputs on pre-Windows XP systems. This allows for multiple keyboards and mice to be supported. Also changed keyboard and mouse behavior to use non-exclusive mode in DirectInput, and to keep the devices alive during pause for more consistent input handling.

I'm using a JPAC which is recognized as a keyboard, so it must be using raw input already.

Regarding hidusbf, I did install it in the system I used for the previous tests (with the keyboard leds). It didn't seem to make any difference then, that's why I didn't install it in the Core2Duo. But that was before I had the new frame_delay option and the proper led wiring, so I need to install it now and measure it more accurately to see if there's any extra lag that can be removed.

Definitely now any possible improvement will come from minimizing the time it takes to the system to pass input messages to MAME, so probably this is influenced by the overall system performance. Theoretically more powerful and multi-tasking efficient systems should react faster, but you never know.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on July 29, 2013, 05:14:21 pm
Actually MAME already uses raw input since version v0.117u1: http://mamedev.org/updates/whatsnew_0117u1.txt (http://mamedev.org/updates/whatsnew_0117u1.txt)

Well, that's only partly true. As your quote shows it will only use RawInput for Keyboard and Mouse, but not for any attached (non JPAC) Joysticks or Gamepads. This can also be seen this from the log:

Code: [Select]
RawInput: APIs detected
Input: Adding Mouse #0: HID-muis
Input: Adding Gun #0: HID-muis
Input: Adding Mouse #1: HID-muis
Input: Adding Gun #1: HID-muis
Input: Adding Kbd #0: HID-toetsenbordapparaat
Input: Adding Kbd #1: HID-toetsenbordapparaat
DirectInput: Using DirectInput 7
Input: Adding Joy #0: XBOX 360 For Windows (Controller)
Input: Adding Joy #1: 2600-daptor

Quote
I'm using a JPAC which is recognized as a keyboard, so it must be using raw input already.

Ah, so then for the purpose of this test it has already been using RAWInput, that's good.

It still leaves the many people using gamepads / joysticks etc. for their MAME and/or their SNES, Genesis, PSX, etc emulation (using GroovyUME) in the cold though :( In that respect adding RawInput for GamePads/Joysticks could possibly still bring a general benefit to GM...

I can imagine though if the JPAC is the sole input device you're using for MAME/MESS (SNES etc), that for yourself there's little benefit of adding RawInput as the general Input api. Well, hopefully for us poor souls not using jpac, you could still put it somewhere on your list... somewhere... deep deep down your priority list I guess... ;)

Quote
Regarding hidusbf, I did install it in the system I used for the previous tests (with the keyboard leds). It didn't seem to make any difference then, that's why I didn't install it in the Core2Duo. But that was before I had the new frame_delay option and the proper led wiring, so I need to install it now and measure it more accurately to see if there's any extra lag that can be removed.

OK, it will be interesting to see if it makes a difference in the test. From personal experience (in WinUAE) it can make a noticable difference with some shoot 'm ups (like Hybris on Amiga).

Given your interestings tests, I was thinking about some possible drawbacks that we may need to keep in the back of our mind. Could it possibly be that given that the recordings are done at 120fps, i.e ~8ms per frame, that this recording resolution may be too low to actually reliably record the input reduction given by the USB overclock? 

I guess another (related) thing could be that because of the 120fps resolution, and the camera not being synchronized to the vblank of the game (as you earlier mentioned) that in every individual test there's a risk of measuring an improvement (or vice versa a deterioration) while there actually may be none, simply because the camera started recording at another point in the frame?

Quote
Definitely now any possible improvement will come from minimizing the time it takes to the system to pass input messages to MAME, so probably this is influenced by the overall system performance. Theoretically more powerful and multi-tasking efficient systems should react faster, but you never know.

Indeed, theoretically the more powerful the better. That said, a less powerful system running less background processes, may possibly be more powerful regarding input response than a more powerful system running more background processes. Ah, the complexity of modern PC's/OS's. We could really need a realtime OS for this emulation stuff...
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: papasi on July 29, 2013, 05:42:24 pm
Hi all,

sorry i haven't followed groovymame's development.

so it has .65 frames at best? that's not better than shmupmame is it?

i didn't go into the details of how shmupmame reduce the input lag (as well as groovymame).
can groovy or shmup take advantage of each other's trick and make it even better?

btw, does groovymame use github or other public repo?

i tried to contribute to shmupmame but they dont have a repo and the maintainer is not responding to issues compiling the binary so it discourages people from contributing or have faith in it not ending up being abandonware.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on July 29, 2013, 06:30:25 pm
Given your interestings tests, I was thinking about some possible drawbacks that we may need to keep in the back of our mind. Could it possibly be that given that the recordings are done at 120fps, i.e ~8ms per frame, that this recording resolution may be too low to actually reliably record the input reduction given by the USB overclock?

Maybe. This camera can record at 240 fps too, although the quality is highly degraded (320x240). However it should still show the sprites moving to allow frame counting (the frame counter will be too blurry unless I increase the font size somehow).

Quote
I guess another (related) thing could be that because of the 120fps resolution, and the camera not being synchronized to the vblank of the game (as you earlier mentioned) that in every individual test there's a risk of measuring an improvement (or vice versa a deterioration) while there actually may be none, simply because the camera started recording at another point in the frame?

Certainly, that's why I'm making longer videos now so all possible deviations are averaged. I didn't post results confirming any improvements in the frame_delay feature until I got consistent results from several videos, anyway, please don't take these figures as definitive until more accurate tests are done (preferrably by other people). I mean, I'm quite sure it's below 1-frame of lag but 0.65 is just what I'm getting from this particular video and game.

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on July 29, 2013, 06:59:03 pm
Hi papasi,

Welcome and thanks for your videos!

so it has .65 frames at best? that's not better than shmupmame is it?

Bear in mind that all my tests are done with v-sync enabled, while I *believe* shmupmame has been tested without v-sync.

When tested GroovyMAME without v-sync, it resulted in no-lag (always compared with your supergun video). However I'm only interested in v-synced emulation.

Quote
i didn't go into the details of how shmupmame reduce the input lag (as well as groovymame).
can groovy or shmup take advantage of each other's trick and make it even better?

The approach in shmupmame is different: they remove the sprite queue so the sprites are shown faster although not in sync with the backgrounds. Recently they have a new the method, forcing the emulated CPU to get the input faster bypassing the emulated pcb's input processing. In either case, that involves breaking the emulation itself.

GroovyMAME patches leave the emulation alone and just try to optimize the way the external part of the emulator talks to the host system.

In theory, both methods combined might result in less lag than the actual hardware, but that's not the goal of GroovyMAME.

Quote
btw, does groovymame use github or other public repo?

The source is available as available as .diff files here:

https://code.google.com/p/groovyarcade/downloads/list

There's also a git but it's not being mantained as such.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on July 30, 2013, 06:00:48 am
Maybe. This camera can record at 240 fps too, although the quality is highly degraded (320x240). However it should still show the sprites moving to allow frame counting (the frame counter will be too blurry unless I increase the font size somehow).

It would be nice if that works, with a 4 ms interval the chance of getting "wrong" readings is at least dimished by a lot.

Quote
Certainly, that's why I'm making longer videos now so all possible deviations are averaged. I didn't post results confirming any improvements in the frame_delay feature until I got consistent results from several videos, anyway, please don't take these figures as definitive until more accurate tests are done (preferrably by other people). I mean, I'm quite sure it's below 1-frame of lag but 0.65 is just what I'm getting from this particular video and game.

No I won't take the figures as definitive. But assuming that the supergun streetfighter doesn't have any delays built-in that weren't present on the real arcade cabinet, I take your figures as a good estimate (the best objective estimate we currently have even...). Personally I would rate the results definitive much easier if you had the real hardware sitting at your desk, and both real hardware and emulation were filmed with the same camera. No comparing different sources. Second, highly preferable would be to do comparisons with real hardware that has zero lag. For example a real SNES (Super Turrican and the likes) or Megadrive compared versus GroovyUME. But that said, I'm pretty happy with your tests and the fact that we at least have some objective facts now :)

A bit OT maybe, but I'm actually quite surprised that the real StreetFighter hardware has such a big lag. I would have expected such a quick reflexes game would be running everything within a frame. 4 frames of delay actually seems like a crazy big lag? Or would that delay actually be intended by the developers, given the type of "special move" that is initiated in the game?  I'm completely ignorant when it comes to a "supergun", so possibly this is a very dumb question, but how trustworthy are these "superguns" when it comes to replicating the original hardware?
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jdubs on July 30, 2013, 08:26:29 am
Dr. Venom, the supergun idea was / is to allow folks to play arcade hardware at home.  In the case of the Shoryuken video, Capcom CPS2 hardware, specifically Super Street Fighter II Turbo.  There is no standard for the hardware but they all basically just route signals to allow plugging in joysticks, a monitor etc.  Some have converters to change the arcade RGB signal to something like component or s-video so that you can use it on a consumer-style television.  No lag should be introduced by the device itself so the results posted by Papasi should be 100% representative of the arcade game.

I have one which is about as bare bones as you get:

http://arcadeforge.net/Supergun-MAK-Strike/Supergun-MAK-Strike::74.html?MODsid=d0063e34b32656adde30d6599457ca8f (http://arcadeforge.net/Supergun-MAK-Strike/Supergun-MAK-Strike::74.html?MODsid=d0063e34b32656adde30d6599457ca8f)

-Jim


Maybe. This camera can record at 240 fps too, although the quality is highly degraded (320x240). However it should still show the sprites moving to allow frame counting (the frame counter will be too blurry unless I increase the font size somehow).

It would be nice if that works, with a 4 ms interval the chance of getting "wrong" readings is at least dimished by a lot.

Quote
Certainly, that's why I'm making longer videos now so all possible deviations are averaged. I didn't post results confirming any improvements in the frame_delay feature until I got consistent results from several videos, anyway, please don't take these figures as definitive until more accurate tests are done (preferrably by other people). I mean, I'm quite sure it's below 1-frame of lag but 0.65 is just what I'm getting from this particular video and game.

No I won't take the figures as definitive. But assuming that the supergun streetfighter doesn't have any delays built-in that weren't present on the real arcade cabinet, I take your figures as a good estimate (the best objective estimate we currently have even...). Personally I would rate the results definitive much easier if you had the real hardware sitting at your desk, and both real hardware and emulation were filmed with the same camera. No comparing different sources. Second, highly preferable would be to do comparisons with real hardware that has zero lag. For example a real SNES (Super Turrican and the likes) or Megadrive compared versus GroovyUME. But that said, I'm pretty happy with your tests and the fact that we at least have some objective facts now :)

A bit OT maybe, but I'm actually quite surprised that the real StreetFighter hardware has such a big lag. I would have expected such a quick reflexes game would be running everything within a frame. 4 frames of delay actually seems like a crazy big lag? Or would that delay actually be intended by the developers, given the type of "special move" that is initiated in the game?  I'm completely ignorant when it comes to a "supergun", so possibly this is a very dumb question, but how trustworthy are these "superguns" when it comes to replicating the original hardware?
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on July 30, 2013, 08:39:24 am
If the output of a supergun is RGB, then definitely it should be 100% real. However if it's converting the signal to S-video or similar crap, then we can't rely on the results. Hopefully Papasi can confirm this.

On the other hand, the test done by NKI shows 4 frames (1 less than Papasi's):

NKI Testing Input Lag street fighter 2 turbo - arcade, dreamcast, ps1, CCC2 (http://www.youtube.com/watch?v=JoJzobmdGzU#)

To clarify, Papasi's test shows 4 frames of lag = 5 frames counting since the LED lights up. NKI's apparently shows 1 frame less (4 frames counting since the LED lights up). But there is only one sample in the video, so I wouldn't trust it too much.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on July 30, 2013, 02:05:38 pm
Jim, thanks for the explanation on the supergun. Seems like a cool bit of kit to have.

It would be interesting to know how the joystick connectors work on these boards. Basically how the signal routing is done and whether or not there's a conversion done like for example on many of these chinese "PS3->USB", "SNES->USB" etc. adapters, which unfortunately many of them use chips to poll the controller in the range of (a disappointing) 125Hz. Quite possibly the only reason being that they are cheap to use.

Given the price of some of these superguns I wouldn't be surprised if some have these cheap chinese controllers on them... But that is pure speculation on my side. I might be completely wrong, and possibly all these things are wired to work in realtime.

On the other hand, the test done by NKI shows 4 frames (1 less than Papasi's):

That's an interesting video. I think you're right that we shouldn't trust it too much, but given how the test has been setup, with additional "double tests" being done (like keeping the test the same but switching the TV and such) to verify whether the results are consistent does give some comfort on the quality of the test. To be honest, given this new "evidence", I think there's quite a chance that the real hardware has only 3 frames of lag.

But indeed, maybe Papasi has some idea on where these differences may come from.

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jdubs on July 30, 2013, 02:37:06 pm
Jim, thanks for the explanation on the supergun. Seems like a cool bit of kit to have.

It would be interesting to know how the joystick connectors work on these boards. Basically how the signal routing is done and whether or not there's a conversion done like for example on many of these chinese "PS3->USB", "SNES->USB" etc. adapters, which unfortunately many of them use chips to poll the controller in the range of (a disappointing) 125Hz. Quite possibly the only reason being that they are cheap to use.

Definitely.  They are good to have around to the extent you've got room for arcade boards.  At least with the MAK (and I think *most* super guns), the controls are all real-time.  That is, the super gun just routes inputs from a connector (in the MAK's case a simple db-15) to the JAMMA interface.  No conversion going on at all, just signal routing. 

-Jim
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on July 30, 2013, 03:14:13 pm
Definitely.  They are good to have around to the extent you've got room for arcade boards.  At least with the MAK (and I think *most* super guns), the controls are all real-time.  That is, the super gun just routes inputs from a connector (in the MAK's case a simple db-15) to the JAMMA interface.  No conversion going on at all, just signal routing.

That's the best you can get, good to know they work that way.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: papasi on July 30, 2013, 08:30:05 pm
The reason why the frame counts are not consistent in ST is because of this

http://combovid.com/?p=5002 (http://combovid.com/?p=5002)

At JP T3/ US T2, SSF2T is running at 143% compared to SSF2 (the previous version)

The frame skip pattern is like this 2,2,3,2,2,3,…   

Ideally it is easier to test with SSF2 or Hyper SF AE with turbo set to 0. But no one is interested in those competitively and the people who have the boards for those games are not interested in emulator lags either...

So for ST the only way is like what I and Calamity did, run a good amount of samples like 20 and average them.

The supergun shouldn't have lag compared to the arcade cab.

Also, even though it takes ~= 4.5 frames from the time you press round house to the animation changes on the screen, the arcade game has no lag.

All the moves in the game have different frame data and for round house kick it doesn't become active until a few frames later.

Otherwise that move will be overpowered as you can punish anything immediately.

And Calamity is right. It is unclear whether NKI just ran a bunch of samples and took the medium value or not.
That's why I posted a video of me doing 20 in a row.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jdubs on July 30, 2013, 09:13:27 pm
papasi, certain of your tests are with a crt, but is the crt fed an RGB signal from the supergun or is it being run through a video converter (to something like s-video, component, etc.)?

-Jim


The reason why the frame counts are not consistent in ST is because of this

http://combovid.com/?p=5002 (http://combovid.com/?p=5002)

At JP T3/ US T2, SSF2T is running at 143% compared to SSF2 (the previous version)

The frame skip pattern is like this 2,2,3,2,2,3,…   

Ideally it is easier to test with SSF2 or Hyper SF AE with turbo set to 0. But no one is interested in those competitively and the people who have the boards for those games are not interested in emulator lags either...

So for ST the only way is like what I and Calamity did, run a good amount of samples like 20 and average them.

The supergun shouldn't have lag compared to the arcade cab.

Also, even though it takes ~= 4.5 frames from the time you press round house to the animation changes on the screen, the arcade game has no lag.

All the moves in the game have different frame data and for round house kick it doesn't become active until a few frames later.

Otherwise that move will be overpowered as you can punish anything immediately.

And Calamity is right. It is unclear whether NKI just ran a bunch of samples and took the medium value or not.
That's why I posted a video of me doing 20 in a row.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jdubs on July 31, 2013, 04:45:47 pm
Guys, another discussion regarding this topic (tangentially) is happening on SRK.  Some more pretty interesting info:

http://forums.shoryuken.com/discussion/181076/super-turbo-offline-setup-guide-lag-rating (http://forums.shoryuken.com/discussion/181076/super-turbo-offline-setup-guide-lag-rating)

-Jim
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on August 01, 2013, 02:58:27 pm
Papasi, thanks a lot for your detailed post, the frame skip pattern certainly explains the uneven results from the videos. Regarding the frames it takes to get certain movements to show, it's interesting to use the shift+P method that you surely know, and indeed you can see how some movements take longer than others. Thinking about it, it makes sense that a game that is supposed to accept combos can't show the action instantly, it must wait to check if a certain sequence of inputs happens before it decides what to do (just an idea).

However this information got me thinking that possibly this game is not the best test case if we want to find what is lowest actual input lag achievable with the MAME emulator. Based on the list here: http://shmups.system11.org/viewtopic.php?t=26394 (http://shmups.system11.org/viewtopic.php?t=26394) there are games that have no lag, i.e. action happens on the next frame. This means that if you use the shift+P method, while pressing the joystick on one direction, you'll see how the spacecraft or whatever moves right on the next frame. This is so because pausing the emulator and stepping frame by frame discards any possible input latency due to the host system overhead. This way you make sure that when you press shift+P, the input is already available for the next frame. But, would this happen when running the game normally?

On the other hand, when recording a game that has a refresh rate different than 60 Hz (as is the case of cps2), you'll see how the raster's begin and end roll by the screen during the video. This makes it difficult to get accurate lag results because depending on which point the raster is at the time the frame is captured, relative to the point when the LED lights up, you will need to be very careful to actually judge how many whole frames you need to count.

Fortunately, if we film a game that has an exact refresh rate of 60 Hz, the raster position is going to be "static" between diferent captures. This makes the task much easier. I've chosen Terra Cresta, because it's 60 Hz and it's known to have the minimum possible lag (action happens on next frame).

What I've found is that GroovyMAME can really get the action happen on next frame, but this is only true if the input happens somewhere inside the first 1/3 of the previous frame. I'm running GM with -frame_delay 7. This means that the period of time that takes from 1/3 to 7/10 of frame (green and red lines in the picture) is the estimate lag attributable to the host system. The USB polling rate has been set to 1000 Hz, and GM is using raw input already (JPAC), so this is the bare minimum lag that seems to be possible for my particular system (Core2Duo).

This has interesting implications. In most games action happens on the lower 2/3 of the screen. This means that when you get to see the sprite, it's normally too late to get your input on time for next frame. So for horizontal games, you'll experience the minimum possible lag when the character sprite is on the top 1/3 of the screen, while for vertical games this means the left 1/3. So for horizontal games, the ultimate setup would be physically rotating the monitor by 180º (upside down) and then rotating the picture again with MAME  ;D

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: matrigs on August 01, 2013, 06:07:13 pm
If someone with a bit of electronical experience would chip in, we could achieve a way of recording the game screen with perfect accuracy, syncing the camera to the vertical sync.

The Playstation EYE camera has an exposed frame sync input on its' chip, which has been used to sync the recording speed with another Playstation EYE, as these have also an exposed vertical sync output.

http://www.instructables.com/id/The-EyeWriter-20/?ALLSTEPS (http://www.instructables.com/id/The-EyeWriter-20/?ALLSTEPS)

http://www.zhopper.narod.ru/mobile/ov7720_ov7221_full.pdf (http://www.zhopper.narod.ru/mobile/ov7720_ov7221_full.pdf)

It shouldn't be difficult to connect a vertical sync signal from a source computer, this way achieving perfect sync regardless of the refresh rate, so perfect for cps2 games etc.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on August 04, 2013, 01:58:15 pm
Hi Calamity,

Fortunately, if we film a game that has an exact refresh rate of 60 Hz, the raster position is going to be "static" between diferent captures. This makes the task much easier. I've chosen Terra Cresta, because it's 60 Hz and it's known to have the minimum possible lag (action happens on next frame).

It's great that you've been doing these additional tests, they are truly valuable.

Quote
What I've found is that GroovyMAME can really get the action happen on next frame, but this is only true if the input happens somewhere inside the first 1/3 of the previous frame. I'm running GM with -frame_delay 7. This means that the period of time that takes from 1/3 to 7/10 of frame (green and red lines in the picture) is the estimate lag attributable to the host system. The USB polling rate has been set to 1000 Hz, and GM is using raw input already (JPAC), so this is the bare minimum lag that seems to be possible for my particular system (Core2Duo).

It's especially nice that we now can attach a figure to "host system lag". Basically what your test says is that the host system lag for your system, while using rawinput and 1ms clocked usb ports, is that it takes 6ms for the input to be available to the application (GM in this case). I had a quiet hope that this would be lower, but given my own tests and experience I do find a "base" host lag of 6ms to be plausible. It would be interesting to see how this compares to other systems but I guess that will be difficult to test.

So with a frame_delay of 7 we are at 11ms (average) input delay for a 60hz game. I guess the maximum possible reduction would be the ability to run at a frame_delay of 10, reducing the delay only to the host system delay or in other words 6ms. But I wonder if that will be ever feasible give the -variation- in frame emulation time and the facts that the Windows wait command may sometimes result in less than exact 1ms wait times also.

Ah well, in the end, as it is now, being able to reliably reduce average input delay to half a frame makes me a very happy gamer :) 

For discussion sake, I disagree on the rotating monitor part :D

1) In many shoot' m ups your worst enemies are at or coming from the top of the screen (gradius, r-type, etc.), wouldn't want to have that rotated to the "slow" displaying part of the screen

2) Given that human reaction time when measured from sensing (with eyes or ears) to muscle action is physiologically taking on average more than 200 (two-hundred) ms, it's impossible for a human to react to something -unexpected- happening in the first 1/3rd displayed on screen and move the joystick in that same frame.

I guess Arcade games are more about recognizing sprite patterns and anticipating to those. By anticipation and adaption a large part of the 200ms+ "reaction time" may be reduced. E.g. if you know the road with all its corners by heart you can drive faster (knowing when to turn the wheel) than someone for whom the road is totally unfamiliar.

Given this adaption mechanism "reaction time" becomes quite a complicated thing. Bottom line is that we can still differentiate between input delay  up to a granularity of single frame delays (on average at least), but for the rest... I guess that may be something for the X-Files :)
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: adder on August 04, 2013, 08:29:02 pm
on a sidenote, i've found that using a gamepad instead of a real arcade joystick reduces 'human lag'

on a gamepad which uses a typical dpad, there isn't much travel/time lost between physically moving the dpad from eg. left to right.  with a real arcade joystick obviously the travel between eg. left and right is greater, and the longer the joystick shaft, the worse things get :o (no doubt that's why those sanwa short shaft/super light/high sensitivity ball-top joysticks are popular amongst streetfighter fans)

i really noticed the difference on my mame pc when i tried a real arcade joystick (with a zero delay encoder board) instead of my ps2 dualshock2* controller (via ps2 to usb adapter), and went back to my dualshock2 controller right away. my brother reported the same problem too with his ultimarc minipac + arcade joystick (although i suppose you could argue that we are simply more used to using dpads rather than real arcade joysticks.. who knows.  with a real arcade joystick maybe things get better once you master just moving your wrist instead of your entire arm (which i admit i tend to do :lol)


*modded, because as standard i find the diagonals are a bit too hard to hit!

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on August 05, 2013, 02:39:58 pm
on a sidenote, i've found that using a gamepad instead of a real arcade joystick reduces 'human lag'

on a gamepad which uses a typical dpad, there isn't much travel/time lost between physically moving the dpad from eg. left to right.  with a real arcade joystick obviously the travel between eg. left and right is greater, and the longer the joystick shaft, the worse things get :o (no doubt that's why those sanwa short shaft/super light/high sensitivity ball-top joysticks are popular amongst streetfighter fans)

Yes the joystick hardware configuration can certainly make a difference. I'm not sure whether a good gamepad mechanically can be quicker than a -good- joystick, but alas. Personally I prefer a Gamepad for console (MESS) emulation only, but for Arcade gaming and some homecomputers I highly prefer a joystick. For the latter I'm a big fan of my Suzo Arcade joystick (with 1ms USB adapter) for many shooters as the joystick is -really- tight (mechanically) in its movement. (http://en.wikipedia.org/wiki/The_Arcade_%28joystick%29 (http://en.wikipedia.org/wiki/The_Arcade_%28joystick%29))

Unfortunately it only supports two firebuttons, so I've been looking for an alternative and purchased the X-Arcade joystick (http://www.xgaming.com/store/arcade-joysticks-and-game-controllers/product/x-arcade-solo-joystick/ (http://www.xgaming.com/store/arcade-joysticks-and-game-controllers/product/x-arcade-solo-joystick/)). But sadly (IMHO) that joystick very much suffers from the point you made, it takes quite large movements to get the microswitches to trigger :(.  There is a way too make them tighter to react (as per the manual on the x-arcade site), but even then it doesn't come close to the Suzo Arcade joystick mentioned earlier.

I'm thinking about replacing only the joystick on the X-Arcade board with the "Suzo System 500 Joystick", mentioned as the "Euro-Stik" on this page on Ultimarc.com: http://www.ultimarc.com/controls.html (http://www.ultimarc.com/controls.html) :

Quote
This is the Suzo System 500 stick. This is one of the most popular sticks in European arcades. It's fair to say that compared to the traditional USA sticks it takes some getting used to, but it has many fans with it's short, well defined throw. It is fully adjustable 4-8 way by rotating the plate (the screws can be left slightly loose) and even has a 2-way mode!
Mounting this stick under a wood panel takes a little more work as it has a raised ridge around the shaft which needs a recess. It's also great for mounting into the top of a panel, covered by an overlay, or on a metal panel.

This seems to be the real Arcade version joystick of the earlier mentioned Suzo Arcade for home use. Hopefully it's as tight in its movement as I hope it to be...

Quote
with a real arcade joystick maybe things get better once you master just moving your wrist instead of your entire arm (which i admit i tend to do :lol)

LOL, I remember doing that too when I got my very first home computer, not only bend over with my arm, but move with my hole body. Especially fun when you saw family members / friends doing the exact same thing, it looked really silly ;D
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on August 08, 2013, 09:56:08 am
Hi Calamity,

What I've found is that GroovyMAME can really get the action happen on next frame, but this is only true if the input happens somewhere inside the first 1/3 of the previous frame. I'm running GM with -frame_delay 7. This means that the period of time that takes from 1/3 to 7/10 of frame (green and red lines in the picture) is the estimate lag attributable to the host system. The USB polling rate has been set to 1000 Hz, and GM is using raw input already (JPAC), so this is the bare minimum lag that seems to be possible for my particular system (Core2Duo).

Now that we've almost hit rock bottom on the possible input delay reductions, finally getting a sense of all the variables involved (many they are) to get to the lowest possible latency, I was thinking of some very last straws to latch onto, to possibly lower that average input latency of 0.65 frame even further.

Basically where we are now:

I have two observations:

Regarding 1:  On my machine, a 4.6Ghz i7 3770k, using MESS drivers that run unthrottled in the range of 2600% (26 times faster than real hardware frametime), it seems as if the frame_delay has a limit of 7, before starting to skip frames occasionaly (my personal conclusion), adding to the input latency. I find it odd that for this setup and PC hardware, frame_delay isn't able to reliably use a value of 8 or 9, or the valhalla 10 even, given how high the untrottled speed is?

I can currently think of only one reason, which is the reliability of the Windows wait() function. Apparently this normally defaults to a smallest value of 10ms wait time, regardless of whether you specify a smaller value. Only by setting specific parameters the granularity can be increased to the lowest possible, which I understand to be 1 ms. Now I did a small test some time ago, and from my findings it looked that Windows mostly provides wait() periods with granularity of up to 1ms, but every now and then will throw in a 4ms wait time. I'm definitely not sure how this works for MAME, but "random" instances where the wait() time will extend by 4ms, would most definitely be the cause for the frame_delay feature to not work to its fullest extend, because any setting - larger than 7 - will then occasionaly push a frame beyond the vblank, causing a skipped frame and adding 16+ ms to the input delay.

Hopefully other knowledgeable people can bud in, as possibly the above is one of the causes that - if improved upon - could lower the input delay by possibly as much as 4ms for many drivers when run on a fast PC.

Regarding 2: I'm wondering about the base host delay,  i.e. the delay in the USB signal traveling through the Windows layers, being 6ms and how this works.

In Calamity's update we will have the following loop:
Code: [Select]
a.display frame-> b.run the frame_delay (i.e. wait!) ->c. poll input -> d. emulate frame -> (loop to a.display frame)
Which to me raises the questions:

If there's any possible slight delay from the "c.input poll", then (with multithreading enabled and frame emulation starting in parallel) the input event may not be available to the frame_emulation in time! Thus adding a full frame of extra input delay in these situations. Even if that only occurs occasionaly, possibly depending on speed of host PC, then that would be very detrimental to the whole purpose of the frame_delay feature.

In case the above might be true for certain situations, what could be a possible solution that would not burden the emulation in other ways? Currently "frame_delay" is simply "waiting" until point X in the frame. Can't we make that wait time more productive, i.e. why not make the frame_delay wait() period a simple loop that does:

Code: [Select]
while time left {
poll input;
}
 
That would make it extremely certain that the very last input event is available to the d.emulate_frame part, even if there would be a possible "host system delay" in a multithreading setting between c.poll_input and d_emulate frame.  Possibly this method could wipe a large part of the current "host system delay", further reducing the input_latency?

I guess I may be very wrong about this, as I have no understanding of how and where the "host system delay" in a Windows PC is adding up to the measured 6ms, and I'm also not knowledgable about how the input polling event works out in a multi-threading setting.

Hopefully there's some reason to it (but possibly not :D), and we'll be able to squeeze out some of those remaining 11ms in input latency...
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: cools on August 09, 2013, 06:51:11 am
I don't know how to fix it, but you're definitely right that with Windows the "host system delay" varies, regardless of how fast your hardware is - I notice it.

It might be something that isn't fixable without changing to a realtime OS where delays are guaranteed.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jdubs on August 13, 2013, 01:38:29 pm
Hi Calamity,

What I've found is that GroovyMAME can really get the action happen on next frame, but this is only true if the input happens somewhere inside the first 1/3 of the previous frame. I'm running GM with -frame_delay 7. This means that the period of time that takes from 1/3 to 7/10 of frame (green and red lines in the picture) is the estimate lag attributable to the host system. The USB polling rate has been set to 1000 Hz, and GM is using raw input already (JPAC), so this is the bare minimum lag that seems to be possible for my particular system (Core2Duo).

Now that we've almost hit rock bottom on the possible input delay reductions, finally getting a sense of all the variables involved (many they are) to get to the lowest possible latency, I was thinking of some very last straws to latch onto, to possibly lower that average input latency of 0.65 frame even further.

Basically where we are now:
  • frame_delay feature allows us to reliably reduce traditional emulator delay, by moving the frame emulation closer to vblank. A setting of 7 seems to reliably reduce the input delay with rounded 12ms for a 60hz game, leaving about 5ms of delay.
  • "host system delay", i.e. the delay in the USB signal traveling through the Windows layers, seems to add about 6ms.

I have two observations:

Regarding 1:  On my machine, a 4.6Ghz i7 3770k, using MESS drivers that run unthrottled in the range of 2600% (26 times faster than real hardware frametime), it seems as if the frame_delay has a limit of 7, before starting to skip frames occasionaly (my personal conclusion), adding to the input latency. I find it odd that for this setup and PC hardware, frame_delay isn't able to reliably use a value of 8 or 9, or the valhalla 10 even, given how high the untrottled speed is?

I can currently think of only one reason, which is the reliability of the Windows wait() function. Apparently this normally defaults to a smallest value of 10ms wait time, regardless of whether you specify a smaller value. Only by setting specific parameters the granularity can be increased to the lowest possible, which I understand to be 1 ms. Now I did a small test some time ago, and from my findings it looked that Windows mostly provides wait() periods with granularity of up to 1ms, but every now and then will throw in a 4ms wait time. I'm definitely not sure how this works for MAME, but "random" instances where the wait() time will extend by 4ms, would most definitely be the cause for the frame_delay feature to not work to its fullest extend, because any setting - larger than 7 - will then occasionaly push a frame beyond the vblank, causing a skipped frame and adding 16+ ms to the input delay.

Hopefully other knowledgeable people can bud in, as possibly the above is one of the causes that - if improved upon - could lower the input delay by possibly as much as 4ms for many drivers when run on a fast PC.

Regarding 2: I'm wondering about the base host delay,  i.e. the delay in the USB signal traveling through the Windows layers, being 6ms and how this works.

In Calamity's update we will have the following loop:
Code: [Select]
a.display frame-> b.run the frame_delay (i.e. wait!) ->c. poll input -> d. emulate frame -> (loop to a.display frame)
Which to me raises the questions:
  • What are the chances that (part of) the "host system delay" is from the point on that the "c.poll input" is done?
  • Does "c.poll input" return OK with 100% certainty before moving to d.emulate_frame in a multi-threading setting?

If there's any possible slight delay from the "c.input poll", then (with multithreading enabled and frame emulation starting in parallel) the input event may not be available to the frame_emulation in time! Thus adding a full frame of extra input delay in these situations. Even if that only occurs occasionaly, possibly depending on speed of host PC, then that would be very detrimental to the whole purpose of the frame_delay feature.

In case the above might be true for certain situations, what could be a possible solution that would not burden the emulation in other ways? Currently "frame_delay" is simply "waiting" until point X in the frame. Can't we make that wait time more productive, i.e. why not make the frame_delay wait() period a simple loop that does:

Code: [Select]
while time left {
poll input;
}
 
That would make it extremely certain that the very last input event is available to the d.emulate_frame part, even if there would be a possible "host system delay" in a multithreading setting between c.poll_input and d_emulate frame.  Possibly this method could wipe a large part of the current "host system delay", further reducing the input_latency?

I guess I may be very wrong about this, as I have no understanding of how and where the "host system delay" in a Windows PC is adding up to the measured 6ms, and I'm also not knowledgable about how the input polling event works out in a multi-threading setting.

Hopefully there's some reason to it (but possibly not :D), and we'll be able to squeeze out some of those remaining 11ms in input latency...

Great post, Dr. Venom!  I'm looking forward to Calamity's response.

-Jim
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: machyavel on August 13, 2013, 02:14:12 pm
Hi,

Do you people think something like "fidelizer" freeware could be of any help reducing the "host lag" ??

http://www.windowsxlive.net/fidelizer/ (http://www.windowsxlive.net/fidelizer/)

However it's only for vista and above...
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jdubs on August 14, 2013, 02:48:13 pm
A follow-up to this as I'm right in the midst of working through this.

I'm actually having my Sega Saturn pad "hacked" with a DB15 cable.  The cable will run into a box where I have two PCBs, one a PS360+ for pretty much all consoles and the 2nd an I-PAC USB keyboard encoder.  There will be two outputs, one an RJ45 for the PS360+ and one a USB for the I-PAC.  The I-PAC will be for MAME use, specifically using its raw_input protocol. 

The box will allow me to use a bunch of different controllers that I decide to hack with a DB15 cable.

-Jim


on a sidenote, i've found that using a gamepad instead of a real arcade joystick reduces 'human lag'

on a gamepad which uses a typical dpad, there isn't much travel/time lost between physically moving the dpad from eg. left to right.  with a real arcade joystick obviously the travel between eg. left and right is greater, and the longer the joystick shaft, the worse things get :o (no doubt that's why those sanwa short shaft/super light/high sensitivity ball-top joysticks are popular amongst streetfighter fans)

Yes the joystick hardware configuration can certainly make a difference. I'm not sure whether a good gamepad mechanically can be quicker than a -good- joystick, but alas. Personally I prefer a Gamepad for console (MESS) emulation only, but for Arcade gaming and some homecomputers I highly prefer a joystick. For the latter I'm a big fan of my Suzo Arcade joystick (with 1ms USB adapter) for many shooters as the joystick is -really- tight (mechanically) in its movement. (http://en.wikipedia.org/wiki/The_Arcade_%28joystick%29 (http://en.wikipedia.org/wiki/The_Arcade_%28joystick%29))

Unfortunately it only supports two firebuttons, so I've been looking for an alternative and purchased the X-Arcade joystick (http://www.xgaming.com/store/arcade-joysticks-and-game-controllers/product/x-arcade-solo-joystick/ (http://www.xgaming.com/store/arcade-joysticks-and-game-controllers/product/x-arcade-solo-joystick/)). But sadly (IMHO) that joystick very much suffers from the point you made, it takes quite large movements to get the microswitches to trigger :(.  There is a way too make them tighter to react (as per the manual on the x-arcade site), but even then it doesn't come close to the Suzo Arcade joystick mentioned earlier.

I'm thinking about replacing only the joystick on the X-Arcade board with the "Suzo System 500 Joystick", mentioned as the "Euro-Stik" on this page on Ultimarc.com: http://www.ultimarc.com/controls.html (http://www.ultimarc.com/controls.html) :

Quote
This is the Suzo System 500 stick. This is one of the most popular sticks in European arcades. It's fair to say that compared to the traditional USA sticks it takes some getting used to, but it has many fans with it's short, well defined throw. It is fully adjustable 4-8 way by rotating the plate (the screws can be left slightly loose) and even has a 2-way mode!
Mounting this stick under a wood panel takes a little more work as it has a raised ridge around the shaft which needs a recess. It's also great for mounting into the top of a panel, covered by an overlay, or on a metal panel.

This seems to be the real Arcade version joystick of the earlier mentioned Suzo Arcade for home use. Hopefully it's as tight in its movement as I hope it to be...

Quote
with a real arcade joystick maybe things get better once you master just moving your wrist instead of your entire arm (which i admit i tend to do :lol)

LOL, I remember doing that too when I got my very first home computer, not only bend over with my arm, but move with my hole body. Especially fun when you saw family members / friends doing the exact same thing, it looked really silly ;D
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jdubs on August 15, 2013, 10:41:40 pm
Dr. Venom, if we run games at 120hz, shouldn't we then be at 1/4 frame of lag?  That's pretty sweet.

-Jim



Hi Calamity,

Fortunately, if we film a game that has an exact refresh rate of 60 Hz, the raster position is going to be "static" between diferent captures. This makes the task much easier. I've chosen Terra Cresta, because it's 60 Hz and it's known to have the minimum possible lag (action happens on next frame).

It's great that you've been doing these additional tests, they are truly valuable.

Quote
What I've found is that GroovyMAME can really get the action happen on next frame, but this is only true if the input happens somewhere inside the first 1/3 of the previous frame. I'm running GM with -frame_delay 7. This means that the period of time that takes from 1/3 to 7/10 of frame (green and red lines in the picture) is the estimate lag attributable to the host system. The USB polling rate has been set to 1000 Hz, and GM is using raw input already (JPAC), so this is the bare minimum lag that seems to be possible for my particular system (Core2Duo).

It's especially nice that we now can attach a figure to "host system lag". Basically what your test says is that the host system lag for your system, while using rawinput and 1ms clocked usb ports, is that it takes 6ms for the input to be available to the application (GM in this case). I had a quiet hope that this would be lower, but given my own tests and experience I do find a "base" host lag of 6ms to be plausible. It would be interesting to see how this compares to other systems but I guess that will be difficult to test.

So with a frame_delay of 7 we are at 11ms (average) input delay for a 60hz game. I guess the maximum possible reduction would be the ability to run at a frame_delay of 10, reducing the delay only to the host system delay or in other words 6ms. But I wonder if that will be ever feasible give the -variation- in frame emulation time and the facts that the Windows wait command may sometimes result in less than exact 1ms wait times also.

Ah well, in the end, as it is now, being able to reliably reduce average input delay to half a frame makes me a very happy gamer :) 

For discussion sake, I disagree on the rotating monitor part :D

1) In many shoot' m ups your worst enemies are at or coming from the top of the screen (gradius, r-type, etc.), wouldn't want to have that rotated to the "slow" displaying part of the screen

2) Given that human reaction time when measured from sensing (with eyes or ears) to muscle action is physiologically taking on average more than 200 (two-hundred) ms, it's impossible for a human to react to something -unexpected- happening in the first 1/3rd displayed on screen and move the joystick in that same frame.

I guess Arcade games are more about recognizing sprite patterns and anticipating to those. By anticipation and adaption a large part of the 200ms+ "reaction time" may be reduced. E.g. if you know the road with all its corners by heart you can drive faster (knowing when to turn the wheel) than someone for whom the road is totally unfamiliar.

Given this adaption mechanism "reaction time" becomes quite a complicated thing. Bottom line is that we can still differentiate between input delay  up to a granularity of single frame delays (on average at least), but for the rest... I guess that may be something for the X-Files :)
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on August 17, 2013, 10:47:17 am
Hi Dr.Venom,

Thanks a lot for your post and interest on this subject, thanks extensive to all people following this thread. Before going into more explanations, I do believe that the tests I posted about the other day do show the actual limits of input responsiveness on the specific hardware that was used (Core2Duo) and OS (XP 64). There might be a little room for improvement but I don't think that we can magically overcome the current limitations of the hardware/OS. Of course better hardware may perform better.

The good news are the -frame_delay model has been proven to be correct. By saying this, I mean that we have shown that it is perfectly possible for the average hardware to make input available right for the next frame in the emulation of common systems. Well, this is not exactly the discovery of cold fusion, but it's good to have, finally, some evidence of what we had suggested long ago: that 16.7 ms is plenty of time at the computer scale to allow for the required input processing to be done right in time to be theoretically lagless. On this regard, it is not so important if we still have some amount of sub-frame lag (host system lag), as this can be expected to be reduced steadily as hardware continues getting faster. IMHO, it's even more important to have defeated the myth that v-synced emulation, necessarily, adds at least a frame of lag, as being conceptually wrong. It is the common way of implementing v-sync what causes lag, with the extreme cases of the hidden flip queues that may be highly responsible for the black legend of v-sync.

Regarding the reliability of the wait functions, we need to clarify that MAME allows you to enable or disable a Sleep API call inside the throttling loop. This can be done through the -sleep option. For my tests, -sleep was disabled. This means that we're not asking Windows to perform a "wait", in such a way that the control might be given back to us after the requested period. When disabling -sleep, MAME just counts the ticks of the CPU clock in a tight loop until we reach the required point. So, the fact that we can't reliably apply a -frame_delay factor or 8 or 9, being the CPU perfectly capable, means that there's something else taking the control from us. I'm almost sure this is due to the OS giving the control to some other higher priority threads. In a way, disabling the -sleep option involves behaving in an uncivilized manner from the OS's point of view, and it's not strange the OS stops us for a while when it judges that other threads need their time slice. For this very reason, my tests were done with the -priority option set to 1, which is the highest possible allowed by MAME, in an attempt to reduce the chances of being stopped by the OS. However, it's not enough. So we could analize the source base to see if there's any way to increase the thread priority at all (see THREAD_PRIORITY_TIME_CRITICAL, REALTIME_PRIORITY_CLASS) or we've already reached de maximum, being aware that stealing all the CPU time from the system may leave it in a sluggy condition that might lead to a sudden hiccup periodically (not sure of this).

Finally, regarding the suggestion of a continous input poll while waiting, I think this wouldn't mean any difference, as inputs are event driven rather than polled. So think about the inputs as messages that get stored in a mailbox. It doesn't matter if you check the mailbox 100 times in a day or just once before you go to bed, the amount of messages you will pick during the day is the same.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on August 18, 2013, 09:59:00 am
Hi Calamity,

Thanks for your answer. It's definitely an interesting topic. Given your reply I've been delving deeper into it and may have some exciting news.

For now I'll only focus on the issue below, and get back to the others at a later stage

Regarding the reliability of the wait functions, we need to clarify that MAME allows you to enable or disable a Sleep API call inside the throttling loop. This can be done through the -sleep option. For my tests, -sleep was disabled. This means that we're not asking Windows to perform a "wait", in such a way that the control might be given back to us after the requested period. When disabling -sleep, MAME just counts the ticks of the CPU clock in a tight loop until we reach the required point. So, the fact that we can't reliably apply a -frame_delay factor or 8 or 9, being the CPU perfectly capable, means that there's something else taking the control from us. I'm almost sure this is due to the OS giving the control to some other higher priority threads.

To confirm, I've also always done my tests with -sleep 0. Given my earlier tests about the (un)reliability of the wait function, I've been looking more closely at the timer function. MAME/GM are using QueryPerformanceCounter (QPC) to count the ticks of the CPU clock in a tight loop. Although it's the highest resolution timer available and as such may seem the best, my previously reported personal tests made me believe it also to be somewhat unreliable; showing erratic spikes of 4ms in a simple 1 ms waiting loop.

My hunch that it's an unreliable timer got even more confirmed when I read the following blog:

Beware of QueryPerformanceCounter() : http://www.virtualdub.org/blog/pivot/entry.php?id=106 (http://www.virtualdub.org/blog/pivot/entry.php?id=106)

Based on its finding it concludes: "So, realistically, using QPC() actually exposes you to all of the existing problems of the time stamp counter AND some other bugs." and suggest to use timeGetTime() instead as a much more reliable method. Only caveat, it has a maximum resolution of 1ms, but that's high enough for our purpose. Possibly the fact that QPC has higher overhead, may be the cause for some of its issues, I'm not sure.

So next step was to actually test timeGetTime in MAME setting, and I'm somewhat excited to report that it has solved the issues with the high values for frame_delay, like 8 or 9. I can now reliably run GM with a frame_delay of 9, without issues!! This basicly means that with these high values working properly, we're getting extremely close to realtime behaviour.

Getting MAME to work with the timeGetTime timer was actually surprisingly easy. There's already a timeGetTime routine avalaible as a "backup" timer. The only thing you need to change is the following in src/osd/windows/wintime.c

//============================================================
//  GLOBAL VARIABLES
//============================================================

static osd_ticks_t ticks_per_second = 0;
static osd_ticks_t suspend_ticks = 0;
static BOOL using_qpc = TRUE;
static BOOL using_qpc = FALSE;

This will make it use the timeGetTime timer only. Luckily the code for setting this timer to its highest resolution is also in place, but I suggest you add the following bold line to src/osd/windows/winmain.c. This makes it log the resolution it is using, just so that you can verify that it's using the highest possible precision (1ms):

   // crank up the multimedia timer resolution to its max
   // this gives the system much finer timeslices
   timeresult = timeGetDevCaps(&caps, sizeof(caps));
   if (timeresult == TIMERR_NOERROR)
      timeBeginPeriod(caps.wPeriodMin);
      printf("minimum device resolution is %d millisecond(s)\r\n",caps.wPeriodMin);

Before cheering though, we need to make sure this really works for frame_delay on other setups also. So hopefully it'll be confirmed for your Core2Duo setup. If it will, I guess we may start raising the flag, getting so close to realtime now :cheers:
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on August 20, 2013, 03:42:56 pm
Dr. Venom, if we run games at 120hz, shouldn't we then be at 1/4 frame of lag?  That's pretty sweet.

Hi Jim,

It may be a disappointment but I don't think it works that way.

What's important to understand is that the source material (i.e the emulated Arcade / console games) is at 60hz. This basically means that trickery is needed to make it run at the correct speed when the display is at 120hz.

That trickery has basicly boiled down to frame duplication or black frame insertion. Both in concept being the same, only the latter makes every other frame black, to sort of try to overcome the limititations in LCD technology or prevent some artifacts when running on a 120Hz CRT screen.

Now when it comes to input latency, there are two things to keep in mind: we have a) "real" input latency, which is the delay between an input signal given and the time it takes to register the input, render the frame and start displaying the frame, and b) display latency, which is the time between start and end of displaying the frame. Thus for the sake of this explanation, total input latency consists of real input latency (a) plus the display latency (b).

Now at 120Hz with either frame duplication or black frame insertion, in both cases by default the current frame is emulated at start of the frame. Ideally there is zero time between emulation of a frame (incorporating input changes) and start of displaying it. At 120hz there's still 8ms between start of that emulation and start of displaying that frame. So to get to the optimal situation one would still need to use GM's frame_delay feature to move the frame emulation closer to vblank. This is where there's no gain versus a 60hz display, in both cases you need frame_delay to move the frame emulation equally close to vblank. Which means the score for (a) real input latency on 120Hz and 60Hz is a tie: 1 - 1.

Then it becomes interesting, as some people claim 120hz screens display a frame (start of display to end) in about 8.5ms whereas at 60hz it would take about 17, so you would "gain" 8.5ms in the latency chain. This would seem the most logical conclusion wouldn't it? At least that's what the math says when we would be talking about the human vision as being a computer controlled camera.

"Unfortunately" it doesn't seem to work that way. The human vision is very much analog in the way it works. From what I read the human eye may be best compared to a low speed video camera, "capturing" about 25 frames color frames per second, where images have a persistence of about 1/25th of a second. That's why we're able to see the world around us as continuous. Now this view contrast quite sharply with the assumption that a human (eye) would be able to register 120 frames per second. It simply can't.

Back to the black frame insertion. This is used as a "patch" to overcome the limititations of current LCD technology and get smooth "CRT like" scrolling on a LCD. Now think about this for a minute. This method actually inserts 60 frames of black per second (out of 120), or in other words half of each second is technically pure black. So what happens is that light and dark frames are alternately registered by the human eye, where they have a persistence of 1/25th of a second. This is where the black frame insertion leads to the dimmed screen people are talking about, and they have to crank up brightness/contrast to get back to a normal level. So apparently the human eye's low speed camera is picking up on the black inserted frames. Combining this with the fact that the human eye works more like a low speed camera where images have about 1/25th of a second persistence, I cannot firmly conclude that 120hz with black frame insertion will lower the display latency. An undecided tie at best for me: 1 - 1.

So my personal conclusion would be: real input latency (meaning part "a" of the chain) when it comes to GM, thus being able to use the frame_delay feature, is the same for 60hz and 120hz screens. Display latency (part "b") isn't evidently better for 120hz, basically because the black frame insertion clearly also leads to these non-information frames being picked up by the human brain's "25fps camera" (read the dimmed screen being noticable), which may just as well lenghten latency instead of shortening it. All in all I cannot conclude a 120Hz display will lead to reduced input latency versus a 60Hz display when it comes to emulating 60hz games with GM.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jdubs on August 20, 2013, 08:10:03 pm
Dr. Venom, thank you very much for the very detailed reply / explanation!  What you laid out makes perfect sense.

Oh well....now VERY curious to hear about Calamity's experience with the timer solution that you propose!

-Jim
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on August 22, 2013, 01:50:28 pm
Hmmm, I've been doing some more tests with the timing routine, to see if I could get some better facts on reliability of the QueryPerformanceCounter timer. But for some reason it's giving me pretty solid results now, whichever way I test it. I'm not sure why it was giving me different results last time. I can now also run GM with frame_delay 9 while using QPC, and it's working just as great as with the timeGetTime routine. Given this I'm not sure anymore whether the timeGetTime is more reliable than QPC, as was also suggested by the quoted blog.

I guess this is actually good news, as in theory QPC should be the more accurate timer. Possibly it would make sense to add the two timers as a configurable option to GM? Worth considering I guess, even though it's not necessary given these latest test results.

In any case, to summarize (possibly of benefit to readers just getting in), I'm glad that it's confirmed that it's possible to reliably run GM with a frame_delay setting of 9, which means near real-time behaviour when used in conjunction with:
It seems that (for me personally at least) a long quest for the lowest possible input latency in MAME/MESS has come to an end...  Thanks to the superb GM and of course "Groovy" Calamity :)

Beware of QueryPerformanceCounter() : http://www.virtualdub.org/blog/pivot/entry.php?id=106 (http://www.virtualdub.org/blog/pivot/entry.php?id=106)
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: rCadeGaming on August 22, 2013, 05:39:16 pm
I've been watching this thread from about the beginning.  VERY exciting stuff.  Calamity and Dr. Venom, I really appreciate all the work you are putting into this, and have been for some time.

I hadn't spoken up yet because I didn't have anything meaningful to contribute, but I think I finally thought of something:

I assume different frame_delay settings may sometimes be required for different games, with some being more demanding than others.  Is it possible to create an "auto" setting?  It could start at 9, then back off incrementally if it detects that it is regularly skipping frames.  This would save the work of carefully determining a frame_delay setting for each game.

Is this a possibility?
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jdubs on August 23, 2013, 03:56:27 pm
Let's talk about the reliability of emulation at frame_delay of 9.  Pressing F11 gets you the operational competency of the emulation / computer you're running the emulation on (correct me if I'm wrong.)  I don't get any "Skips" but the emulation does dip down below 100% on the more challenging (SH-3-based) games very frequently.  Is this problematic from an accuracy standpoint?

This is with an i7 3770k.

-Jim 
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: adder on August 23, 2013, 04:07:00 pm
i'd be interested to know opinions of the reliability of overclocking usb ports to 1ms (1000hz) .. ie. is there any risk to your hardware or performance issues etc?

below:  from Raziel's UsbRate v0.5:

(http://s10.postimg.org/gaob7mort/ijf8f8.png)
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on August 23, 2013, 05:31:37 pm
Hi rCadeGaming,

Thanks for the nice comments.

I assume different frame_delay settings may sometimes be required for different games, with some being more demanding than others.  Is it possible to create an "auto" setting?  It could start at 9, then back off incrementally if it detects that it is regularly skipping frames.  This would save the work of carefully determining a frame_delay setting for each game.

Is this a possibility?

You're certainly right that different games may need different frame_delay settings, it all depends on how demanding each game is.

I'll let Calamity judge whether or not he would think it would be feasible to create some sort of auto setting, but to manage some expectations, personally I don't think that will be as easy as it may sound.

Let's talk about the reliability of emulation at frame_delay of 9.  Pressing F11 gets you the operational competency of the emulation / computer you're running the emulation on (correct me if I'm wrong.)  I don't get any "Skips" but the emulation does dip down below 100% on the more challenging (SH-3-based) games very frequently.  Is this problematic from an accuracy standpoint?

This is with an i7 3770k.

Jim, yes dipping below 100% is problematic from an accuracy standpoint. The game should run at 100% at all time, no dipping allowed except for maybe shortly at startup. Otherwise something is definitely wrong.

Just one important point up front. The frame_delay in current public GM is sort of broken. Calamity found a way to improve it and has made a patch for that, which will be in the next update if I'm right. In my tests I've been using a manual patch that I applied myself to the source. I'm not sure that it'll make your example any different, but just that you know.

As replied to rCadeGaming, different games put different demands on the host hardware, resulting in one game being more demanding than others. What "more demanding" really means is that it needs longer to emulate a frame than a less demanding game does. Of course the longer it takes to emulate a frame, the lower the frame_delay value can be (otherwise you'll be pushing the frame emulation too far that it hasn't finished emulating the current frame before vblank comes), and vice versa, the less demanding the game the higher the frame_delay can be.

What's important to understand is that you know your way around testing how demanding a game is. That's actually quite simple: run the game you want to test unthrottled and it will tell you what speed it can achieve. You do this by running it once with the -nothrottle option from command shell. You also add "-v" such that it will output some stats at exit. After that it's simple math.

So as an example, if I run outrun in MAME ("mame.exe outrun -nothrottle -v", let it run for some 30 seconds then quit, then on my machine it shows that it's able to run at 792% unthrottled . So for simplicitly I'll round this to 800%, or said differently 8 times as fast as the original hardware.

Now outrun originally runs at 60hz (and something), i.e. 60 fps. Dividing 1/60 gives us 0,016667 seconds per frame, which multiplied times 1000 says each frame is taking 16.67 milliseconds. Since mame runs it 8 times faster on average, it means on average a frame in mame takes 16.67/8=2.08 milliseconds. I'm stressing the "on average", as emulation is mostly not about averages: some frames may take longer to emulate than others. As a rule of thumb you may multiply the average frame emulate time by 2, i.e. the toughest frames to emulate take twice the average. So that brings us to 2 times 2.08 = 4.16 milliseconds that we at least need in each frame left to emulate the frame and still be in time for vblank.

So how large can frame_delay then be? Each frame takes 16.67ms of which 4.16ms need to be left for emulation. So 16.67ms - 4.16ms = 12.51ms is the maximum value at which we need to start the emulation. Now, frame_delay goes in steps of 1/10th a frame (with maximum setting 9). So in this case each higher value from 0 is adding 16.67/10 = 1.67ms.  The largest value for frame_delay that may be used is thus 12.51ms/1.67ms = 7(.47). So I could use a frame_delay of 7 for outrun on my machine (a 4.6Ghz 3770K) , going any higher to 8 or even 9 , would most surely lead to some (or a lot) emulated frames not being finished in time anymore for vblank, and thus skipped frames / loss of emulation accuracy / input latency added.

Of course you can try to play a little bit with the frame_delay values, but deviating from the calculated value above is more likely to get you in trouble then not. Of course you should also always keep in mind that some drivers / games may be demanding in way that makes the time to emulate different frames go all over the place, such that the average speed it runs at won't be helping you.
 
So, as above example shows, you'll not be able to run all drivers at a frame_delay of 9. But at least you now may have an idea how to calculate what a good value can be. Of course trial and error would in the end bring you to round about the same value. Expect most to be in the range of safe values of say 5-7, with only the drivers that run -really- fast that can be set to reliably run with a frame_delay of 8 or 9. And of course not forgetting the fact that some real demanding games / drivers may not even go higher than a value of 1 or 2. In my testing I used for example a driver that runs unthrottled at 2600%, or 26 times as fast as the original. Now do the maths and you'll see that's a candidate to run at a value of 9 ;)


i'd be interested to know opinions of the reliability of overclocking usb ports to 1ms (1000hz) .. ie. is there any risk to your hardware or performance issues etc?

below:  from Raziel's UsbRate v0.5:

For as far as I know there's no real risk to your hardware. If you notice some erratic behaviour then you can always uninstall the USB rate overclock and set it back to normal. From what I read on some hardware it may show that erratic behaviour, but I personally never had any issues with it. Do note that I have no experience with the tool you're quoting, I've been using the USB overclock mentioned elsewhere in this thread.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on August 25, 2013, 03:13:53 pm
Jim, did the issue with the SH-3-based games and frame_delay make sense after all, given my explanation about frame_delay in the previous post, or do you think there may be an issue still?


Let's talk about the reliability of emulation at frame_delay of 9.  Pressing F11 gets you the operational competency of the emulation / computer you're running the emulation on (correct me if I'm wrong.)  I don't get any "Skips" but the emulation does dip down below 100% on the more challenging (SH-3-based) games very frequently.  Is this problematic from an accuracy standpoint?

This is with an i7 3770k.

-Jim
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jdubs on August 26, 2013, 01:43:48 pm
Dr. Venom, sorry for the delay, but your explanation makes perfect sense and is inline with what I was thinking.  I need to find a "happy medium" setting for all the games I commonly run. 

Btw, I'll post some pics of the new stick I finished, soon.  Its got a separate box holding an I-PAC....turned out really nice.

-Jim
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on August 27, 2013, 03:34:52 pm
Jim, good to know that it made sense. The "happy medium" is indeed good to have as a general setting. Just in case you didn't use this already, if you're really fuzzy about getting a maximum frame_delay for some specific games (that are able to run faster than the happy medium), you can also create a seperate .ini for those games. More work, but also closer to perfection  :)

Btw, I'll post some pics of the new stick I finished, soon.  Its got a separate box holding an I-PAC....turned out really nice.

Sounds great, will be nice to see what you've come with.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jdubs on August 27, 2013, 03:57:21 pm
Here it is, guys.  Radio Shack box with a DB-15 input and two outputs, one a Neutrik RJ45 and the other a Neutrik USB.  Inside are two PCBs.  I have an I-PAC outputting USB for use with GroovyMAME (the I-PAC registers as a keyboard so raw_input is the protocol employed) and a PS360+ for all gaming systems.  The box is basically a jack of all trades.

I had to wire a switch for the ground as the two PCBs weren't playing nice when simply wired in parallel.

The two controllers I have are a pad-hacked Sega Saturn 6-button (I didn't do the pad-hack) and a Namco PS-1 which I modded with a new JLF (used the shorter Namco shaft and also a new 3lb spring) and Sanwa 30mm buttons.  Both controllers output via a DB-15 and feed directly into the above mentioned box.  The USB output from the box goes from the I-PAC to my computer and the RJ-45 output goes to just about any gaming system you want (I use it mostly for xbox 360).

The whole setup took a while but works great.  Here are some pics:

(http://i41.tinypic.com/2kg0ap.jpg)

(http://i44.tinypic.com/35kmtzb.jpg)

(http://i44.tinypic.com/23icor4.jpg)

(http://i41.tinypic.com/2j5j4w6.jpg)

(http://i41.tinypic.com/2vt49p2.jpg)

(http://i42.tinypic.com/2942lja.jpg)
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jdubs on August 27, 2013, 11:10:40 pm
Jim, good to know that it made sense. The "happy medium" is indeed good to have as a general setting. Just in case you didn't use this already, if you're really fuzzy about getting a maximum frame_delay for some specific games (that are able to run faster than the happy medium), you can also create a seperate .ini for those games. More work, but also closer to perfection  :)

Btw, I'll post some pics of the new stick I finished, soon.  Its got a separate box holding an I-PAC....turned out really nice.

Sounds great, will be nice to see what you've come with.

Definitely separate .ini files are the ideal!  Will be working towards that...  :)

-Jim
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: SMMM on August 30, 2013, 11:50:03 am
Calamity found a way to improve it and has made a patch for that, which will be in the next update if I'm right.

Does anyone know when this will be?
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on August 31, 2013, 06:06:02 am
Here it is, guys.  Radio Shack box with a DB-15 input and two outputs, one a Neutrik RJ45 and the other a Neutrik USB.  Inside are two PCBs.  I have an I-PAC outputting USB for use with GroovyMAME (the I-PAC registers as a keyboard so raw_input is the protocol employed) and a PS360+ for all gaming systems.  The box is basically a jack of all-spades.

Very nice :) Having the Gamepad also working with the I-PAC/RawInput API and the extension to other systems looks great.

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jdubs on September 01, 2013, 10:19:31 am
Here it is, guys.  Radio Shack box with a DB-15 input and two outputs, one a Neutrik RJ45 and the other a Neutrik USB.  Inside are two PCBs.  I have an I-PAC outputting USB for use with GroovyMAME (the I-PAC registers as a keyboard so raw_input is the protocol employed) and a PS360+ for all gaming systems.  The box is basically a jack of all-spades.

Very nice :) Having the Gamepad also working with the I-PAC/RawInput API and the extension to other systems looks great.

Thanks man!  I was just going to use a PS360+ PCB but the lag advantage of using an I-PAC (and I actually had one laying around) drove me to wire both of them in there.  The Namco stick took forever to wire up.  Had to do a bunch of Dremeling to get the JLF to fit just right.  Turned out pretty sweet, though.

-Jim
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on September 04, 2013, 08:37:18 am
Hi,

Long post coming up about the accuracy of the timer used in MAME for Windows, but please bear with me, as I believe this will improve the GroovyMAME timer reliability, thus benefitting throttle/frame_delay/vsync accuracy.

Following up on the earlier discussion about the topic here (http://forum.arcadecontrols.com/index.php/topic,133194.msg1381358.html#msg1381358), I've found additional information showing that in certain situations the QueryPerformanceCounter (QPC) timer method used by MAME and GM can indeed suffer from erratic timing behaviour, thus possibly messing up the effectiveness of the frame_delay feature.

First up is the fact that the hardware implementation of the High Precision Event Timer (HPET), which is the basis for QPC, is suffering from a design defect on some chipsets. See the following page by Microsoft, listing a number of known chipsets to have this defect:

Performance counter value may unexpectedly leap forward
http://support.microsoft.com/kb/274323 (http://support.microsoft.com/kb/274323)

Next up is the fact that QPC may in some situations deliver unreliable timing when used on AMD dual core or Intel multi core systems running XP or older:

Programs that use the QueryPerformanceCounter function may perform poorly in Windows Server 2000, in Windows Server 2003, and in Windows XP
http://support.microsoft.com/kb/895980 (http://support.microsoft.com/kb/895980)

The issues reported on above pages are quite likely also the cause for the findings in this blog page, as posted previously:

Beware of QueryPerformanceCounter():
http://www.virtualdub.org/blog/pivot/entry.php?id=106 (http://www.virtualdub.org/blog/pivot/entry.php?id=106)

It is clear from these links that the QPC timer method isn't a robust timer method, and may be degrading the emulation accuracy of quite some Windows based MAME build Arcade systems. Following on from the earlier post, using TimeGetTime() as the timing method, is expected to lead to (much more) reliable timing for the frame_delay method.

Then I found new information on the High Precision Event Timer hardware in the following blog:

Using HPET for a high-resolution timer on Windows
http://blog.bfitz.us/?p=848 (http://blog.bfitz.us/?p=848)

Because of its importance I'll quote it here:
Quote
Unfortunately, despite the promise of a new regime in 2005, it’s still not automatic; there’s work for you to do.

Even though most motherboards have the HPET timer now, it seems to be disabled by default. There’s an easy way to see if this is true or false – QueryPerformanceCounter will return a value in the 14 million range if HPET is enabled (it’s a 14 MHz timer), and something in the 3 million range if HPET is disabled (the older chip timer).

Now, this is new behavior – QueryPerformanceCounter, some years ago, returned the TSC counter, which is very high-resolution, but has huge swings with power saving modes, and as processors increased in power, power savings turns on all the time. So, Microsoft, with little fanfare, switched QueryPerformanceCounter back to using timers on motherboards. So, if you’re running an older Microsoft OS, you might get a value in the 100 million range if you call QueryPerformanceCounter, and then the following doesn’t apply to you. The bridge was somewhere in the Vista time range, but I’ve seen Vista systems that use TSC for QPC, as well as HPET/RTC for QPC.

void test_time()
{
    LARGE_INTEGER frequency;
    if (!::QueryPerformanceFrequency(&frequency))
    {
        fprintf(stderr, "failed, err=%d\n", ::GetLastError());
        exit(1);
    }
    fprintf(stdout, "freq = %lld\n", frequency.QuadPart);
}

With HPET disabled, I get freq = 3262656 as the output, or 3.26 Mhz. With HPET enabled, I get freq = 14318180 as the output, or 14.3 Mhz. This is on a Windows 7 machine with an Intel i7 975 processor and chipset. The HPET clock listed above will measure intervals with a precision of 70 nanoseconds; while this won’t help time very small sequences of instructions, this will be reasonably precise at the microsecond range.

If your BIOS has HPET enabled, then you can enable HPET in Windows with a bcdedit command, and disable it with a different bcdedit command.

Enable use of HPET

bcdedit /set useplatformclock true

Disable use of HPET

bcdedit /deletevalue useplatformclock

You’ll need to reboot to see changes, because this is a boot-time option (hence the use of bcdedit to change it).

Enabling HPET will change the performance of your system; people tend to inadvertently tune their programs to the specific behavior of a clock. It would be nice if people didn’t do that, but it happens. Anecdotal information says “makes things smoother but slower”, and this would match the idea of applications tuned to a slower clock.

As shown in the blog, it's easily tested whether the HPET is really activated by doing a query via QueryPerformanceFrequency. If it returns 14Mhz then it's enabled, if it's returning some value in the 3Mhz range then it's disabled. I'm using quite a new mainboard, an Asus P8Z77-V, running Windows 7 and guess what? The HPET is indeed disabled in Windows 7, even though I have set it to enabled in the BIOS.

After using the method reported in the blog, to enable the HPET in W7, it's now indeed correctly using the HPET's 14Mhz timer. Where earlier I still had my questions about the reliability of QPC versus the TimeGetTime method, current tests (with the HPET enabled at 14Mhz) make me think QPC is as reliable as the timeGetTime method, if not better. All tested at frame_delay setting of 9.

I've been thinking how we could use this to improve the accuracy / overall reliability of the timer function in GM for Windows. As a suggestion, we could implement three possible settings for the timer, that can be set from the mame/mess config file:

0 = auto
1 = QueryPerformanceCounter
2 = TimeGetTime

The "0" auto setting would be the default, only using QPC when the HPET returns a value in the 14Mhz range. This would be easy to check from the code by doing a query via QueryPerformanceFrequency. If it does not return a value in the 14Mhz range, then the HPET isn't really active, and GM should best default to using TimeGetTime(). The 1 and 2 setting can be used to override the automatic behaviour, if for some reason that would be needed, for example when you have one of the older chipsets with the HPET's hardware design defect.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on September 04, 2013, 10:02:44 am
Hi Dr.Venom,

Thanks a lot for posting about this finding. It will be very easy to add this as a new option into GroovyMAME. This way we will add the timer method information to the logs so you can always know which of the two different timers are being used.

I did some testings indeed with the TimeGetTime method, and I obtained similar results to what I had obtained before with QPC, although this time I raised frame_delay to 8 for Terra Cresta before recording some videos. A value of 9 is erratic in my system (Core2Duo), but 8 is rock solid for this game. I honestly can't remember whether before I was using 7 while I could have used 8 with QPC too.

Anyway, being able to increase frame_delay from 7 to 8 must have an statistical effect in reducing input lag by capturing more input events before the host system lag barrier, although my results were similar to the ones I had previously obtained.

I've been thinking of a way to actually measure the host system lag, in the sub-frame scale. It would involve writing an specific program for it, based on raw input. A solid colour background would flip to a different colour upon a key press event, allowing a high speed camera (240 fps at least) to capture the tearing position between both colours, then based on the moment when the led lights up and the specific period of the video mode used you can calculate the host system lag with some accuracy (it would be necessary to average several samples). Users could run this program to determine their own system's lag. However I doubt many people would go through the process of wiring a led and finding a high speed camera (although these are becoming very common).

Regarding the possibility of implementing an automatic frame_delay factor, yes I guess it should be possible, although there is an obstacle at least that I can think of. In my systems, I have noticed that, very often, I can't achieve steady speed percentages with frame_delay on, even if I am possitive it's performing perfectly. Usually, I see the speed oscillating from 95 to 105%, but the scrollings are totally smooth. This means the speed measurement is wrong, probably due to some side effect of the frame_delay option. This makes it difficult the use the speed percentage as a base for deciding things. Indeed, currently the soundsync feature is disabled while frame_delay is used, which may cause sound glitches as some users have reported. This is done because as soundsync is based on the speed percentage, an erratic speed percentage value makes soundsync crazy. GroovyMAME's soundsync feature uses the speed percentage as a feedback to apply a factor to the emulation speed, causing both values to converge quite soon on a normal situation. The problem comes when the speed percentage is not reliable to begin with. Hopefully a workaround will be found to solve this problem and eventually lead to the implementation of an automatic frame_delay feature.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on September 04, 2013, 04:32:22 pm
Hi Calamity,

Thanks a lot for posting about this finding. It will be very easy to add this as a new option into GroovyMAME. This way we will add the timer method information to the logs so you can always know which of the two different timers are being used.

Great :)

Quote
Anyway, being able to increase frame_delay from 7 to 8 must have an statistical effect in reducing input lag by capturing more input events before the host system lag barrier, although my results were similar to the ones I had previously obtained.

I think the problem may be that the camera "only" has 240 fps. It basicly will limit your measurements to 4ms. I understand that statistically the difference should come through, but I wonder how many frames (with the led lighting up) have to be shot to get the statistics to works. Quite some probably...

Quote
I've been thinking of a way to actually measure the host system lag, in the sub-frame scale. It would involve writing an specific program for it, based on raw input. A solid colour background would flip to a different colour upon a key press event, allowing a high speed camera (240 fps at least) to capture the tearing position between both colours, then based on the moment when the led lights up and the specific period of the video mode used you can calculate the host system lag with some accuracy (it would be necessary to average several samples). Users could run this program to determine their own system's lag. However I doubt many people would go through the process of wiring a led and finding a high speed camera (although these are becoming very common).

Great idea. It would make it so much easier to recognize where the rasterbeam is at and get an even more accurate number on the host system delay. I hope that you'll pull this off 8). With regards to other people running their tests, you're probably right that not many will do so, but having a "simple" and accessible testing method that provides clear results will probably enhance the chance of it.

Quote
GroovyMAME's soundsync feature uses the speed percentage as a feedback to apply a factor to the emulation speed, causing both values to converge quite soon on a normal situation. The problem comes when the speed percentage is not reliable to begin with. Hopefully a workaround will be found to solve this problem and eventually lead to the implementation of an automatic frame_delay feature.

Thanks for pointing that out. Out of interest, could you give a pointer to the bits of code where GM's soundsync feature gets applied (i.e. speed percentage calculation, factor calculation and where this gets fed back into the sound emulation), and also where GM's soundsync gets set to disabled in case frame_delay is used?  That would be much appreciated, just to be able to run some tests and get a better understanding.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: rCadeGaming on September 04, 2013, 05:33:59 pm
Regarding the possibility of implementing an automatic frame_delay factor, yes I guess it should be possible, although there is an obstacle at least that I can think of. In my systems, I have noticed that, very often, I can't achieve steady speed percentages with frame_delay on, even if I am possitive it's performing perfectly. Usually, I see the speed oscillating from 95 to 105%, but the scrollings are totally smooth. This means the speed measurement is wrong, probably due to some side effect of the frame_delay option. This makes it difficult the use the speed percentage as a base for deciding things. Indeed, currently the soundsync feature is disabled while frame_delay is used, which may cause sound glitches as some users have reported. This is done because as soundsync is based on the speed percentage, an erratic speed percentage value makes soundsync crazy. GroovyMAME's soundsync feature uses the speed percentage as a feedback to apply a factor to the emulation speed, causing both values to converge quite soon on a normal situation. The problem comes when the speed percentage is not reliable to begin with. Hopefully a workaround will be found to solve this problem and eventually lead to the implementation of an automatic frame_delay feature.

That would be awesome.  So, if I understand correctly, solving the conflict between soundsync and frame_delay would make the auto-frame_delay setting much easier to implement?  Is there an alternative method of detecting skipped frames that could be a solution for both features? 

How about enabling autoframeskip and watching if it exceeds 0?  Or is autoframeskip also affected by the erratic speed percentage?

Users could run this program to determine their own system's lag. However I doubt many people would go through the process of wiring a led and finding a high speed camera (although these are becoming very common).

I will soon have a good setup for filming in 240fps with an LED in series with a button, and four PC's with different OS's and highly varying performance to test.  Please let me know when you're ready for any help with this, or any other lag testing with high speed video.

-

On a related note, is it possible to achieve minimal input lag without using a keyboard encoder, such as an I-PAC?  I had planned to use MC Cthulhus, which are joystick encoders, for both console and PC support in my cabinet.  I could dual-mod an I-PAC in to handle PC support, but for the sake of simplicity I would like to avoid it unless it's necessary for minimal input lag.  The MC Cthulhu has a 1ms/1000mHz firmware, so could using that with overclocked USB ports match the speed some of you are achieving with an I-PAC?  Directinput should work with joysticks, does it do so in GM, or just with keyboard encoders?
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on September 06, 2013, 09:05:13 am
That would be awesome.  So, if I understand correctly, solving the conflict between soundsync and frame_delay would make the auto-frame_delay setting much easier to implement?

No, it won't make it much easier to implement. See it more as an obstacle that needs to be moved out of the way, before any sort of implementation for auto frame_delay can be considered.

Quote
How about enabling autoframeskip and watching if it exceeds 0?

Blasphemy. ;) *Any* method for an automatic frame_delay must not be based on degrading emulation accuracy.

I think that the solution will be in accurately measuring the time it takes for MAME to emulate frames, as a result of which you'll know how much time there's left, and use that as a base for setting a safe auto value for frame_delay. This would make it possible to enable an auto feature without the method itself being the cause for missed frames. The most challenging part will be to accurately measure frame emulate time and account for the variability in frame emulate time.

Quote
I will soon have a good setup for filming in 240fps with an LED in series with a button, and four PC's with different OS's and highly varying performance to test.  Please let me know when you're ready for any help with this, or any other lag testing with high speed video.

That's great, will be very interesting to see what result you come up with. If Calamity at one point down the road would release his latency measurement tool, I see myself also wiring up a led. Not sure when (if ever) I'll get one of those high speed camera's though.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jdubs on September 06, 2013, 06:11:57 pm
On a related note, is it possible to achieve minimal input lag without using a keyboard encoder, such as an I-PAC?  I had planned to use MC Cthulhus, which are joystick encoders, for both console and PC support in my cabinet.  I could dual-mod an I-PAC in to handle PC support, but for the sake of simplicity I would like to avoid it unless it's necessary for minimal input lag.  The MC Cthulhu has a 1ms/1000mHz firmware, so could using that with overclocked USB ports match the speed some of you are achieving with an I-PAC?  Directinput should work with joysticks, does it do so in GM, or just with keyboard encoders?

The raw_input api (the fastest option) is only available on keyboards in the current revision of MAME.  Hence why I built what I did, above.

-Jim

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: rCadeGaming on September 06, 2013, 11:52:09 pm
Anyone know if that is going to change anytime in somewhat near future?  Looks like I might need an MC Cthulhu, PC Engine PCB, Genesis PCB, Dreamcast PCB, 360 PCB, and I-PAC... per player.   :-[ Well the I-PAC itself will work with both players at once, better stock up on relays.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on September 07, 2013, 08:29:36 am
Anyone know if that is going to change anytime in somewhat near future?

It is quite possible to implement raw input in MAME for joysticks too. It just needs some work. However, I think we could suggest this to actual MAME devs.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: rCadeGaming on September 07, 2013, 10:16:34 am
Ok, I'll send a message through MAMEdev.com.  In the meantime, if I do try an I-PAC, does it matter if I use a PS/2 or USB connection?
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on September 07, 2013, 10:44:36 am
In the meantime, if I do try an I-PAC, does it matter if I use a PS/2 or USB connection?

USB is the preferred connection. See here:

USB or PS/2 for a keyboard emulator?

http://www.ultimarc.com/usb_vs_ps2.html (http://www.ultimarc.com/usb_vs_ps2.html)

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: rCadeGaming on September 07, 2013, 01:10:29 pm
That's good, as its my preferred connection as well, haha. 

This has gotten me thinking that an I-PAC4 might actually be preferable for MAME anyhow, due to some tricks I'm thinking of using the shift button, as well as some things that can't be done in MAME using a joystick, like selecting a save state slot.

I will still send that message to MAMEdev though.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: u-man on September 09, 2013, 12:28:30 pm
I just wanted to thank Calamity and Dr. Venom for this totally interesting thread. It is somehow scientific and i like the approach how things are explained and done. You both did a awesome job here  :notworthy:

Cant wait to see the next things here, bringing MAME emulation to a new level.  :applaud:

Keep up the good work.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: kujina on October 07, 2013, 09:41:03 pm
As far as the USB polling frequency goes when it comes to a J-Pac or I-Pac according to Andy Warne the poll rate is what Windows applies to low speed USB devices and so this does not apply to the J-PAC because it’s a full speed USB device.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jdubs on October 07, 2013, 11:21:21 pm
As far as the USB polling frequency goes when it comes to a J-Pac or I-Pac according to Andy Warne the poll rate is what Windows applies to low speed USB devices and so this does not apply to the J-PAC because it’s a full speed USB device.

Link?

I see this, but its not consistent with your statement:

http://forum.arcadecontrols.com/index.php/topic,132779.msg1365395.html#msg1365395 (http://forum.arcadecontrols.com/index.php/topic,132779.msg1365395.html#msg1365395)

-Jim
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: machyavel on January 23, 2014, 04:41:30 pm
(...)
That's actually quite simple: run the game you want to test unthrottled and it will tell you what speed it can achieve. You do this by running it once with the -nothrottle option from command shell. You also add "-v" such that it will output some stats at exit. After that it's simple math.
(...)

For the record (and if I got it right), the math shrinks to: 10-(2000/Avrg speed)=frame delay

Edit: actually it's 10-[(safety factor x 1000)/Avrg speed)]=frame delay, Dr.Venom choose 2 for a safety factor hence the formula above.

For example let's say a game runs unthrottled at 100% on average, with 1 as a sft. fact. it gives: frame delay=10-(1 x 1000 / 100)=0.

Now a safety of 1 means no safety at all, so let's keep 2 and make a small chart just to get a rough idea at one glance:

0-222% -> 0
223-249% -> 1
250-285% -> 2
286-333% -> 3
334-399% -> 4
400-499% -> 5
500-666% -> 6
667-999% -> 7
1000-1999% -> 8
2000% and over -> 9
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: SMMM on January 25, 2014, 04:42:04 pm
(...)
That's actually quite simple: run the game you want to test unthrottled and it will tell you what speed it can achieve. You do this by running it once with the -nothrottle option from command shell. You also add "-v" such that it will output some stats at exit. After that it's simple math.
(...)

For the record (and if I got it right), the math shrinks to: 10-(2000/Avrg speed)=frame delay

Can Dr. Venom confirm this?  I'm a little confused on his explanation on how to calculate it, so a simple formula like this would be nice. 
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Monkee on February 27, 2014, 09:54:28 am
Really interesting thread, thanks guy for your tests!

One thing I'm not sure to understand though is if we should disable sleep or not at then end?  ???
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on March 02, 2014, 06:24:03 pm
One thing I'm not sure to understand though is if we should disable sleep or not at then end?  ???

I'd say that disabling sleep reduces the chances for input being received late by not allowing the system to take the CPU time from us so often but I guess this highly depends on the target system.

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: adder on April 24, 2014, 02:47:28 pm
fyi taken from another thread (http://forum.arcadecontrols.com/index.php/topic,138817 (http://forum.arcadecontrols.com/index.php/topic,138817))

question to andy warne:
regarding your ipac/keyboard encoders products, there have been some talk about people using using eg. windows xp overclocking their usb ports which are usually 125hz default i believe, up to 1000hz in an attempt to shave off a bit of input lag
i think you may have answered some questions on this before but can u just verify
regarding your usb2.0 devices such as the minipac, is there actually no benefit/no need/no point to overclock the usb ports?
many thanks

his response:
Thats right, there is no need. The fixed default only applies to "Low speed USB" where the host ignores the device-specified interval and substitutes 7 ms. These use Full Speed USB which specifies 2 ms and the host uses this value.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: RandyT on May 17, 2014, 01:48:26 pm
Finally, regarding the suggestion of a continous input poll while waiting, I think this wouldn't mean any difference, as inputs are event driven rather than polled. So think about the inputs as messages that get stored in a mailbox. It doesn't matter if you check the mailbox 100 times in a day or just once before you go to bed, the amount of messages you will pick during the day is the same.

Given that human reaction time when measured from sensing (with eyes or ears) to muscle action is physiologically taking on average more than 200 (two-hundred) ms, it's impossible for a human to react to something -unexpected- happening in the first 1/3rd displayed on screen and move the joystick in that same frame.

When folks are considering input "lag", the two quotes above are very important.  I believe that when people hear the word "lag" they infer the meaning to be that it will slow them down somehow.  What needs to be kept in mind is that everything is relative.  In the world of electronics, things are moving at speeds which mere humans cannot comprehend without using mathematics to represent them.  At a certain threshold, everything becomes instantaneous with regard to our abilities to sense or react to them.  To put this further into perspective, even light traveling in a vacuum has "lag", but unless it's origin is 186282 miles away, it wouldn't even take a second to reach your eyes.  In the sense of display frame timing, 60hz was selected first because it was easy (electricity in North America runs at 60hz) and second because it's well above the 24 fps found to be the minimum for film to produce continuous, "flicker-free" motion images for viewers, at a time when high frame rates were impossible or prohibitively costly to create.

In the case of whether or not emulation can provide the same experience as actual hardware, it all operates relative to the same, very small scale.  The fact that elaborate, frame accurate capture setups with LED indicators are necessary to flesh out what is happening at that level, is proof enough that it is happening at speeds which are well beyond our "instantaneous" perception threshold.  So those who believe they experience lag which can affect their play, due to being a half, or even a full frame behind, are very likely doing a bit of projecting.  But as Calamity states, fast systems should be able to do what the original hardware does for each frame, which is, in simple terms:

Check input
Calculate result based on input
Display result
Repeat

So whether user input is checked once or a thousand times, it will make little difference to the code, as it must do these things in the above order.  Any input buffered after the initial check, will need to be acted upon in the next frame, as it cannot stop what it's doing and recalculate the result in midstream.  The original code is likely to be further limited by allowing only one data transition for each frame which is calculated.  So in the case of the example used earlier, either the ship moves one pixel per displayed frame, or it does not.  Having these movements queued 1000 times over, as opposed to just once, will make no difference to the code.

So, given the above, "low-speed" (again, a very relative term which has no bearing on human perception) USB devices which report the states of all controls at 8ms intervals (i.e. twice for each generated frame) are more than sufficient.  Therefore, 1 ms polling intervals are entirely unnecessary, and really only adds unnecessary system overhead and bus traffic.

The most important thing to look at with controllers is the manner in which input is processed.  For example, a controller which transmits it's states 1000 times a second will not be advantageous if that controller takes 20 or 30ms to decide whether the status of the devices connected to it have changed.  It will simply be uselessly transmitting exactly the same data until it does.  Those who might believe that it's important to get an input message in "under the wire" before the inputs are polled and when processing begins, really need to take a hard look at the extremely tiny fraction of time where this occurs, and honestly assess whether there is any practicality there whatsoever.   And if there's not, then adding more burden to the system in the way of higher polling rates, doesn't make sense, and may actually work against the goal.

Then, take a hard look at the actual devices (buttons, joysticks, etc,) and decide whether these are offering the performance you expect for the types of games you play.  There is a vast difference in the mechanics of varying devices, which can lead to a delay between the time you intend to perform an action, and when it actually gets to the point of indicating to the controller that something has changed.  If this is the cause of issues for you, even original arcade hardware would show them, with those devices attached.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: sudopinion on May 18, 2014, 02:11:35 pm
just wondering why retroArch + KMS wasn't included in these tests.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Monkee on May 18, 2014, 03:02:25 pm
just wondering why retroArch + KMS wasn't included in these tests.
+1, it seems that KMS doesn't suffer from the lag you encounter with sdl so it's the best solution for linux on the paper.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on May 19, 2014, 01:14:41 pm
Hi RandyT,

Thanks a lot for sharing your insight here. I totally agree with what you're saying.

For example, a controller which transmits it's states 1000 times a second will not be advantageous if that controller takes 20 or 30ms to decide whether the status of the devices connected to it have changed.

This is a very important observation imho. And unfortunately it's something we can't measure through the crude tests we're doing here. We just see the led light up when the switch is triggered but the whole process until the ship moves is a black box for us.

Definitely this is where we should look at. In fact in my experience increasing the usb poll rate didn't make any difference to the results, and instead, to my surprise I started to see issues with my usb hard disks not being recognized.

Quote
Then, take a hard look at the actual devices (buttons, joysticks, etc,) and decide whether these are offering the performance you expect for the types of games you play.

This is another point, I'd say with many joysticks the simple act of moving the stick to one direction until the switch is triggered takes more time than the amount of "lag" we're dealing with here, not sure.

Quote
Check input
Calculate result based on input
Display result
Repeat

Exactly (if we leave apart subtle things like boards that could be polling inputs in the middle of a frame, etc.)

Because emulation of the usual arcade hardware is based on a loop of discrete steps, it's obvious for us that if we made each step last as long as 1 minute, we would always get the input in time for the next frame, even with the most retarded operating system. Now if we gradually reduce the length of each step until the miliseconds range or more it's also obvious that at some point the system won't be able to get the input processed for the next frame. What we're saying here is that with current hardware/OSes, this critical point shouldn't be reached when emulating the hardware at its original speed. The tests with Windows XP / Core2Duo showed that we are *almost* there.

The problem is some people just hear "windoze" and automatically think of a bloated monster that is searching for unused icons on your desktop while it should be processing your input and simply forget that a 16.67 ms period is not exactly in the Plank scale when it comes to modern hardware. The critical point on the software side is having reliable vertical blank signals, and make sure the driver doesn't arrange a frame queue. Here is where mystifiers come to say this is impossible to achieve in "windoze" (while they type from their laptops). The good thing about using the same drivers for everyone is that at least one of the sources of uncertainty (the software) is out of the equation.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on May 19, 2014, 01:20:50 pm
just wondering why retroArch + KMS wasn't included in these tests.

+1, it seems that KMS doesn't suffer from the lag you encounter with sdl so it's the best solution for linux on the paper.

To be honest when I did those tests I wasn't aware of the existence of Retroarch.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: vicosku on May 27, 2014, 10:41:57 am
Here is a 120FPS video of Terra Cresta in Retroarch's Mame .151 core running in Ubuntu 14.04 with KMS.  The button press LED is directly below the monitor.

https://anonfiles.com/file/4625beaeabff31ae37556b6075a8a456

Please verify my counting method, but here are my results:

10, 11, 10, 9, 9, 10, 10, 10, 11, 10, 11, 11, 11, 10, 11

Average: 5.1333333333333

While this is respectable and 1 or 2 frames faster than my test in Windows with Hard GPU sync enabled, my tests with Shmupmame are definitely faster, averaging between 3 and 4 frames. I understand this may actually be more responsive than the real hardware and not desirable?  Also, what is the best result achieved by GM so far?  The last post with specific results appears to be reply #14 where 5.4 is reported for SSF2T.  What was the best result with Terra Cresta?  All the videos seem to be unavailable from the posted links now except for the Youtube video showing 4 frames for the PCB, so I don't have anything to compare my results/counting method with directly, unless I'm overlooking something.  I'm a little uncertain of what the target should be and how close Retroarch+KMS is to it.   Here are the details of this setup:

PC: Dell Optiplex 755 (Core2Duo e6550, Intel Graphics, 4GB RAM)
OS: Ubuntu 14.04 64-bit.  Default Intel Video drivers are used.  The driver documentation says that KMS is enabled by default, so I'm trusting this.
Retroarch/Mame:  Installed from HunterK's PPA. RetroArch 1.0.0.2 from 3/24/14 and libretro-mame 0.151 from 3/15/14
Display:  640x480 VGA out to Extron Emotia in non-interlace mode to Sony PVM 2530
Input Device: Hori Real Arcade EX-SE, with LED wired directly to one of the buttons. 

Edit:  Added more details and questions.
Edit 2: It seems my understanding of how KMS works is faulty.  To be clear, this test was done by launching Retroarch from the X environment.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on May 27, 2014, 05:37:43 pm
Hi vicosku,

Thanks a lot for the test. Maybe you could try testing it on its native orientation, to see if it makes any difference. Your results show a surprising amount of lag, especially considering you're using RA with KMS. It makes me wonder if KMS is actually working in your setup because those results seem to indicate there's a hidden frame queue there.

I've uploaded the video from which I took the results posted here (http://forum.arcadecontrols.com/index.php/topic,133194.msg1377633.html#msg1377633):

https://mega.co.nz/# (https://mega.co.nz/#)!A4ViCCiB!avMcQDlg2F_Sov-QExPv7slNuL-ssmqcE95_gNHVbCo

In my video, IIRC the counting was 4 in general (66% of probability) and sometimes 3 when the input happens on the first third of the screen (33% of probability). These values point to 1.5-2 real frames behind input, and because the bare minimum "lag" is 1 frame (=action happens on the next frame on real Terra Cresta hardware), this means 0.5-1 frame of lag as compared to the real hardware. Obviously 0.5 means 0 and 1 means 1 because we deal with whole frames.

PD: Yeah the old links are dead, I'll re-upload the videos if I can find the exact match in my HD.

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: vicosku on May 27, 2014, 06:01:14 pm
Thank you so much for the quick reply and for uploading the video again.  It looks like your results are basically equivalent to what I experienced with ShmupMAME!  As for testing in the proper orientation, I can't get Vsync to work at all with this set up when I rotate the screen, so testing it that way would be pointless. My results were the same with SSF2T though. I'm basing my assumption that KMS is enabled and working on this article:

https://wiki.archlinux.org/index.php/Intel_Graphics

"Kernel Mode Setting (KMS) is required in order to run X and a Desktop environment. KMS is supported by Intel chipsets that use the i915 DRM driver and is enabled by default. Versions 2.10 and newer of the xf86-video-intel driver no longer support UMS (except for the very old 810 chipset family), making the use of KMS mandatory[1]."

I'd be glad to do some more testing when I get the time, but for now I'm much more interested in getting GM set up in XP ASAP and taking advantage of all the apparent benefits over my current setup.  I was actually using FBA instead of MAME because I had hitching issues with scrolling.  And I'm using FBA .2.97.28 instead of .2.97.30(Next?) because the former has severe issues with sound effects in Neo Geo games.  Regardless, I'll leave the Ubuntu hard drive alone so I can easily switch to it later if needed.  I have a laptop running Ubuntu in KMS with an Nvidia adapter loaded up and ready to test as well. 
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on May 27, 2014, 06:37:30 pm
Hi vicosku,

Did you try RetroArch from outside X to make sure KMS is working?

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: vicosku on May 27, 2014, 09:35:46 pm
Oh, I didn't even consider to try that.  I will sometime in the next few days or this weekend.  I just got my XP install running using the dummies' guide.  It's the same computer, just with an AMD 4550 added in.  I can't believe the results.  Here's another video. 

https://anonfiles.com/file/6787127cf93cc7a48a520606c5ff2383

At 120 FPS, the first 8 results are:  6, 7, 7, 7, 5, 6, 5, 7

This was with Frame_Delay 7.  I've been chasing this result for over a decade.  I really have to play some games now.  I truly appreciate the work that you and others have done on this project.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: cools on May 28, 2014, 05:58:22 am
I can't believe the results. I've been chasing this result for over a decade.  I really have to play some games now.  I truly appreciate the work that you and others have done on this project.

Indeed. You've just reminded me that I've been meaning to donate, so I've done so.  Nothing compared to the amount I've wasted over the years on failed attempts. Groovy fixes it all.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on May 28, 2014, 04:54:38 pm
Hi vicosku,

https://anonfiles.com/file/6787127cf93cc7a48a520606c5ff2383

At 120 FPS, the first 8 results are:  6, 7, 7, 7, 5, 6, 5, 7

This video looks like it's recorded at 60 Hz, isn't it?
Just in case it could make any difference, my tests were done with -priority 1, -nosleep.

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: vicosku on May 31, 2014, 02:39:53 pm
All my videos were recorded on an iPhone 5s with the default video app in "slo-mo" mode.  According to the properties shown in Premiere, they are all 120 FPS.  Therefore, the frame counts I've provided should be cut in half for 60 Hz.  I haven't done any editing or conversion.  Sorry that they're so huge, by the way.

I had the sleep option on default, which was 1. I don't see a "nosleep" option.  Is this different than sleep 0?  Anyway, priority was set t 1.  I've made another video with with sleep 0, priority 1, and frame_delay 9.  Sorry that it's vertical this time, but this eliminates the weird diagonal refresh effect from my previous videos.  GM Version is 0.153 on XP64.

https://anonfiles.com/file/57f2012fc9531911296f3edf985c218a
GM 5, 8, 7, 7, 5, 8, 7, 7, 8, 7
69/2 = 34.5/10 = 3.45 Frames of lag per second at 60Hz (Unless my math is wrong, somehow)

Also, I managed to verify that Retroarch is KMS-capable on my machine via the RA wiki and I am able to run it outside of X.  However, I cannot seem to get RA to run at any resolution besides 1024x768 this way.  The RA GUI also claimed to be running at 80Hz.  I couldn't tell whether VSYNC was even working through my Super Emotia at that resolution, so I hooked it up to a 31Khz display.  Sure enough, it was running at 75Hz and vsync was not working despite being enabled in the RA GUI.  Because of this I cannot provide any useful results to compare against those for GroovyMame. 


Edit:  I think my challenges with Retoarch were due to some problems with interference with the laptop's built-in display and with rotation modes.  I'll try it on my desktop when I can and re-run the tests.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on June 02, 2014, 11:41:27 am
Hi vicosku,

For some reason your videos seem to show 1 frame of lag more than mine (2 frames in the 120 Hz video). I'm assuming your control is connected by USB. Your new video is much better than the previous one because it shows the scan position.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: vicosku on June 02, 2014, 02:28:26 pm
Thanks.  I misunderstood the results of your video.  I ended up speed and FPS matching our two videos to the best of my ability to understand what you were saying.  Indeed, my results show a delayed response compared to yours.

https://anonfiles.com/file/f99731e09eacd3206b75ba3c0548524b

So, it should be possible to achieve even better results?  It already feels really good, so I find that incredible.  I'll run through this thread again and try other things. 

Yes, my Hori joystick connects via its internal PCB to USB.  I do not have the polling rate increased because I thought it was unnecessary.  I'll start with that. 

Edit: Oh, RawInput.  I didn't really understand that part of the discussion before delving into this.  My joystick is recognized as a 360 controller, and Mame shows that it is using DirectInput.  I assume this explains the extra delay.
Edit 2: To be clear to anyone reading this thread, we will not be certain that DirectInput vs RawInput is the cause for the difference between Calamity's results and mine until I can perform more definitive testing.

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: machyavel on June 03, 2014, 05:42:50 pm
How do we force raw input?
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on June 03, 2014, 06:01:33 pm
How do we force raw input?

You can't with current MAME unless you use a keyboard encoder or a mouse. It needs to be implemented for joysticks too at some point.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: vicosku on June 03, 2014, 06:04:05 pm
Thanks.  That's what I gathered, so I ordered an I-PAC.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on June 03, 2014, 06:08:36 pm
I ended up speed and FPS matching our two videos to the best of my ability to understand what you were saying.  Indeed, my results show a delayed response compared to yours.

That was an amazing job! Thanks.

Quote
So, it should be possible to achieve even better results?  It already feels really good, so I find that incredible.  I'll run through this thread again and try other things. 

Yes, my Hori joystick connects via its internal PCB to USB.  I do not have the polling rate increased because I thought it was unnecessary.  I'll start with that. 

Edit: Oh, RawInput.  I didn't really understand that part of the discussion before delving into this.  My joystick is recognized as a 360 controller, and Mame shows that it is using DirectInput.  I assume this explains the extra delay.

RawInput may or may not be the problem. As RandyT pointed above, the important aspect of a controller is the time that it takes to decide if a button is pressed or not. So it might as well be a "problem" with that controller. It's impossible to know at this time. At least until raw input for joysticks is implemented in MAME. And maybe then we'll find there's no difference at all, but at least we will know that direct input was not the problem :)
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on June 03, 2014, 06:11:28 pm
Thanks.  That's what I gathered, so I ordered an I-PAC.

Oh! You're faster than me answering. Ok maybe it doesn't mean any difference, we simply don't have enough evidence yet.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: vicosku on June 03, 2014, 06:21:21 pm
Ah, thanks for the clarification.  I needed some sort of interface for another joystick anyway, so the I-PAC won't be a waste.  I'll be able to provide more test results when it arrives as well.  Unfortunately, that supposedly won't be for another 21 days.

While I'm waiting, I'll spend some more time messing around with KMS and Retroarch to see if I can get that working. 
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: adder on June 03, 2014, 06:30:52 pm
....  I'll be able to provide more test results when it arrives as well.  Unfortunately, that supposedly won't be for another 21 days.
perhaps just use a usb keyboard in the meantime for testing  ;)
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on June 03, 2014, 06:35:17 pm
perhaps just use a usb keyboard in the meantime for testing  ;)

It's not easy to plug a led to a keyboard ;)
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: machyavel on June 04, 2014, 04:24:13 pm
How do we force raw input?

You can't with current MAME unless you use a keyboard encoder or a mouse. It needs to be implemented for joysticks too at some point.

Ok I may be slow but does it mean a keyboard always works through rawinput in mame?
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jimmer on June 05, 2014, 05:59:13 am
perhaps just use a usb keyboard in the meantime for testing  ;)

It's not easy to plug a led to a keyboard ;)

But some keys have an LED already supplied  :)  Well, every 2 presses numlock LED will come on.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on June 05, 2014, 12:00:44 pm
Ok I may be slow but does it mean a keyboard always works through rawinput in mame?

Yes, unless you force it to use DirectInput by compiling it with a special flag (people do this to be able to use joystick-to-keyboard software IIRC).

But some keys have an LED already supplied  :)  Well, every 2 presses numlock LED will come on.

Yeah but it's not the same thing, it's the BIOS who controls those leds. I did my first tests posted in this thread by using that technic, it turned out the result was not reliable because the leds took too long to light up.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: cools on June 05, 2014, 06:03:25 pm
Any clue if the SDL lag will be removed?
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on June 08, 2014, 07:16:28 am
Any clue if the SDL lag will be removed?

I'd like to target that at some point, although fixing that will involve going through the whole graphic layer stack as we have no clue where exactly the problem lies. Maybe it's wiser to bypass the whole thing and address the video card through kms after all.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: vicosku on June 11, 2014, 10:34:49 pm
My I-PAC arrived today! Quite the surprise.  Here are the results.  Note that the button is pressed when the LED turns off this time.  Also, I've added multi-threading and can run with frame_delay 9 without problems now.

https://anonfiles.com/file/a29c7c59bd6705fe67d7e57b4dd7bd84

The input lag on my setup now appears as low as that shown in Calamity's videos.  Just to make sure that none of my other changes were responsible, I tested two DirectInput devices again:  a Hori joystick, and a Raphnet Game12. In both cases, my lag was increased to that shown in my previous video.  I'm comfortable saying that Raw Input can reduce lag by at least 1 frame compared to Direct Input in MAME, as others have suggested.  Or at least, an I-PAC or J-PAC can provide this result compared to the two aforementioned interfaces.  I've got some wiring to do.

Also, I did mess with KMS a little bit a week ago.  I've included that video result in the archive as well.  It was done on a 31Khz display at 60hz.  I pressed CTRL+ALT+F1 and then ran Retroarch as root.  The results were slightly better than those when I launched from X, so I assume I did everything right.  They still weren't as good as GM in XP with CRT_Emudriver though.  Note that this was not done with an I-PAC but with my Hori HRAP EX-SE again, so I guess the result should be compared to my previous video.  Calamity, if you think it would be of value, sometime I can try some more KMS tests with the I-PAC or other variables changed.

In summary, these are the best averages I achieved with each scenario.  These numbers count frame 1 as when the LED indicates input occurred and conclude on the frame when action was displayed, inclusive.

X Retroarch 1.0.0.2 Mame Core .151: 5.1333 Frames
GroovyMame .153b XP64 Frame_Delay 9 Direct Input: 3.125 Frames
KMS Retroarch 1.0.0.2 Mame Core .151 (Same input device as above): 4.75 Frames
GroovyMame .153b XP64 Frame_Delay 9 Raw Input via I-PAC: 1.667 Frames

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on June 12, 2014, 08:34:16 am
Awesome job vicosku!

Thanks a lot for your tests. I'm so glad that you could replicate my results there by using the I-PAC & raw input. This is the first confirmation that we have outside of my own testing.

I agree Direct Input is probably the culprit. This hopefully serves to encourage the implementation of raw input for joystick devices in MAME. Only then we would be able to do a fair comparison between joystick and keyboard encoders.

Quote
Calamity, if you think it would be of value, sometime I can try some more KMS tests with the I-PAC or other variables changed.

Definitely, KMS & I-PAC results would be very revealing, I would be grateful if you could eventually test this. I dare to say there might still be a little advantage for GM thanks to the frame delay implementation.

Thanks again!
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: vicosku on June 12, 2014, 11:14:08 am
You're welcome!  I'd be glad to perform more tests.  I have a pretty busy week ahead, so it may take me some time, but I'm certainly eager to see the results as well. 
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: cools on June 12, 2014, 12:17:51 pm
Kind of off topic, but if GM used KMS would it still require X at all?
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Paradroid on June 12, 2014, 05:15:00 pm
In summary, these are the best averages I achieved with each scenario.

Nice work! Thanks for sharing your results.

This thread has been a fine read so far...
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on June 13, 2014, 06:36:14 am
Kind of off topic, but if GM used KMS would it still require X at all?

I'm not sure. I mean the osd relies on X for the most part I believe, so I'd say it would require some major refactoring of the osd layer. Maybe it is easier to simply bypass X on the relevant points.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: cools on June 13, 2014, 09:23:43 am
Correcting myself - I meant would an X server be required or not? Not sure why I'm asking the question (my chosen frontend requires it).
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on June 18, 2014, 06:28:35 am
Correcting myself - I meant would an X server be required or not? Not sure why I'm asking the question (my chosen frontend requires it).

I meant that you can certainly modify MAME to run without an X server but I'm not sure how deeply it must be changed to achieve it, whether you need to create a brand new osd layer (RA approach) or it is enough to modify the existing SDL osd to remove any existing dependencies. Anyway I find much more appealing to respect the current osd and simply bypass SDL around the point where the lag resides: frame flipping. So if you can manage frame flipping directly you can get rid of any hypothetical frame queue. At the same time we could manage mode setting directly too, bypassing xrandr.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: vicosku on June 20, 2014, 10:46:13 am

Definitely, KMS & I-PAC results would be very revealing, I would be grateful if you could eventually test this. I dare to say there might still be a little advantage for GM thanks to the frame delay implementation.


I finally ran the Retroarch KMS test with the I-Pac.  Lag was markedly reduced, but not to GroovyMAME levels.  Again, this is 120FPS and input is received when the LED turns off.

https://anonfiles.com/file/bf5fdc4f6e7eafe8ea5ce2ccbd87d2fb

10 samples at 120FPS: 6,6,7,7,7,5,6,6,6,6

Frames displayed as button is pressed until action is seen: 3.1

Edit: This test was performed at 75Hz and should not be used for comparison against the other results at 60Hz.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on June 22, 2014, 04:53:17 pm
Hi vicosku,

Thanks again for your test, much appreciated. I've been watching your new video, and I notice the game can't be running at 60 Hz, it's probably being displayed at 75 Hz or similar, just like you mentioned some posts above. However, in your video from june the 12th KMS was running at 60 Hz, which is the ideal situation. So the frame count could be affected by this, even though what we actually count are the camera frames, not the game frames, and those are constant among videos, but the proper comparison should be with both videos running at 60 Hz side by side. Although I believe your new video already proves what we intended to prove, I want to be extra cautious before claiming anything.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: vicosku on June 22, 2014, 05:55:58 pm
You're right.  It was 75hz again.  I must have gotten lucky with a kernel option to get 60hz before, but I don't remember what I did.  I'll try to set it up, but I'm pretty clueless with this KMS video mode stuff.  Most of the instructions I find by Googling don't seem to perfectly apply to Ubuntu 14.04.  If someone can give me some fool-proof instructions on how to change the resolution and refresh rate in this scenario, it would help me immensely to provide better test results.  I'll try on my own in the meantime.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: vicosku on June 22, 2014, 06:58:54 pm
I used a dumb trick that seems to have worked.  I just started up everything on a 1440x900@60Hz lcd monitor and then switched to the CRT.  Screenshots of the CRT's OSD and Retroarch's GUI are attached to offer some assurance that this new video is running at 60Hz.  This keyboard encoder result in KMS seems to differ very little from the previous 60hz video I shot with Joysticks compared to the 75Hz one.  Calamity, I'll defer to your conclusions. 

https://anonfiles.com/file/3f6e3966c915fd4733f832acaae5595e

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on June 23, 2014, 08:21:33 am
Hi vicosku, thanks again for your video, I know how much work it involves setting up everything in order to record these things properly.

So this is my counting for your new video (RA_KMS_I-PAC60hz.MOV):

19 samples at 120 fps: 8 9 9 10 10 8 9 9 9 9 10 9 9 8 9 10 8 8 9 -> 4.47 frames (RetroArch in KMS mode)

Indeed, it looks quite similar your previous results with KMS & joystick (4.47 vs 4.75). This may have two readings:
1.- When using the libretro API under Linux in KMS mode, joysticks and keyboards behave just the same in terms of lag.
2.- Any hypothetical advantage of the keyboard encoder is masked by the (suboptimal) input management of this API.


Just for reference, here is my own counting for your video from june the 12th (Frame_Delay_9_I-PACRAW.MOV):

21 samples at 120 fps: 4 3 4 2 4 3 4 4 3 3 4 4 4 4 4 3 4 4 3 3 4 -> 1.78 frames (GroovyMAME, Windows XP 64, frame_delay 9)
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: vicosku on June 23, 2014, 10:57:31 am
Calamity, thanks for reviewing the videos and confirming the frame counts.  I'm glad to perform the tests if they can be of help.  I'm no programmer, so it's nice to finally be able to contribute to the community in some way. 

I don't know the full implications of the KMS results, so I won't comment or speculate.  I just hope they'll be of use to you and other parties.  If I can perform any other tests that would be helpful, please let me know.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: rCadeGaming on November 16, 2014, 11:49:22 pm
Hi everyone, haven't been on in quite a while.  Recently I was able to make time to work on the hobby again and catch up on things.  First, I just wanted to say that the ATOM-15 drivers, and especially the new website, look AWESOME.  Calamity, you're the man!

Anyhow, I'm getting ready to make the switch from XP to 7, and I wanted to benchmark the two to make sure I won't be losing anything in terms of lag.  To start with, I need to optimize my current setup in XP and get a baseline, but I've found something strange along the way. 

Calamity, the good news is that I was able to replicate your test with Terra Cresta.  Following the same counting method, I was also able to get 3-4 frames of video at 120 FPS until the ship moves.

Video here:
http://rcadegaming.com/videos/terracre150.mov (http://rcadegaming.com/videos/terracre150.mov)

BUT, that was done with the GroovyUME 150 beta 01 from about a year ago.  I think it's either the one you PM'd me or the release directly following that.  I'm unable to duplicate these results with the current GroovyMAME 155 (latest version downloaded from Google Code page 11/15/2014).  I keep getting 5-6 frames of video at 120 FPS, meaning a full extra frame of lag at 60Hz.  This is all on the same hardware, the only difference being the MAME release.

Video here:
http://rcadegaming.com/videos/terracre155.mov (http://rcadegaming.com/videos/terracre155.mov)

Are there any known problems with increased lag in the current release?

One thing I can think of is that I'm still using the Switchres 015c release from the same time period as that GroovyUME 150 beta 01 (sorry, I've lost track of which one).  I'm guessing it's unlikely, but could the older CRT_Emu/Switchres conflict with GM 155, causing the extra frame of lag?  Could the problem go away when I upgrade to the newest releases and switch to Windows 7?

Could it have anything to do with audio latency?  I haven't gotten into adjusting that yet.

I'll attach my mame.ini for 155 and the resulting log from Terra Cresta.  Can you see anything that might be causing the problem, or anything that could use improvement in general?

Setup:
XP64
Core i3-2130 Sandy Bridge Dual-Core 3.5GHz CPU
Asus P8Z68-M Pro Motherboard
8GB (2x4GB) DDR3 SDRAM @ 1333MHz
Kade Encoder in USB Keyboard Mode
ASUS HD4350 -> TC1600 VGA to Component Transcoder -> Sony KV-27FS120 TV
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on November 17, 2014, 04:03:25 am
God you ruined my breakfast...  :D

Ok, fortunately (or not) the problem (or not) is related to the terracre.c driver itself. It looks like starting from version 0.155 some sort of sprite buffer has been added to this driver. You can check this by using the shift + P method (pause the game, and keep one cursor pressed while you step frame by frame using pressing shift + P). Starting with 0.155, the spaceship takes one more frame to react. So this driver is no longer the ideal "lagless" driver to use for reference, I'm afraid. Your lag tests are consistent with this.

I've attached the 0.154 vs 0.155 diff.

More info: http://mametesters.org/view.php?id=5700 (http://mametesters.org/view.php?id=5700)
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: rCadeGaming on November 17, 2014, 07:39:04 am
God you ruined my breakfast...  :D

Haha, sorry, didn't mean to alarm you.  Thanks for the quick response.

In any case, I'm very relieved to hear this.  At least I can be sure there's nothing wrong with my setup.  I'll compare lag results with Windows 7 soon.

Looking at my .ini, is there anything that could be improved for an optimal setup, aside from the audio latency value?  I've gone through everything else and set it up as best I could, but just wanted to be sure.

On another note, while testing I found that forcing direct draw by running with "-video ddraw" caused the game to run at 67%.  Any idea why this is happening?  It's not critical, as direct3d seems to be the way to go, but I just wanted a working direct draw option available for testing.  Log attached.  This was run with the same mame.ini posted above.

Thanks again.   :cheers:
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: adder on November 17, 2014, 07:54:09 am
Quote from: Calamity
Ok, fortunately (or not) the problem (or not) is related to the terracre.c driver itself. It looks like starting from version 0.155 some sort of sprite buffer has been added to this driver.
expect here is the change :) http://mametesters.org/view.php?id=5700 (http://mametesters.org/view.php?id=5700)
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: adder on November 17, 2014, 10:59:32 am
guys, if needed i found a couple of games which can be used for lagless testing if necessary (ie. no built in lag regarding the shift+p method):

gunsmoke
vertical: 224x256(v) 60.000000hz

thunderx
horizontal: 288x224(h) 60.000000hz

if u require me to find others/different types of games/anything else, let me know
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on November 17, 2014, 11:06:49 am
Thanks jadder!

thunderx looks like a good choice (you start playing quite fast).
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: cools on November 17, 2014, 04:37:28 pm
Space Invaders responds on the next frame. It's the only thing worth loading it for ;)
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: rCadeGaming on November 18, 2014, 07:24:30 am
Thunder Cross works well.  The only problem is it's a pretty fun game, so it's distracting.   :lol  I started some testing using that, and I'll post up some results when I can compare with Windows 7.

Does anyone need any testing done in XP?  I'd be happy to help, just let me know now, because if my Windows 7 setup works out I'll decommissioning XP soon.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on November 18, 2014, 07:35:32 am
Hi rCadeGaming,

Your .ini file looked fine, so your tests results must be "legit". Thanks for doing these tests.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: adder on November 18, 2014, 08:21:36 am
thanks Rob i might have some xp tests at some point
any timescale on when u will no longer be doing xp tests? (eg. few days, weeks.. months?)
cheers!
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on November 19, 2014, 03:47:55 am
On another note, while testing I found that forcing direct draw by running with "-video ddraw" caused the game to run at 67%.

You may need to lower the -frame_delay value when using ddraw. This is because ddraw keeps the CPU busy for a longer period than d3d.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: rCadeGaming on November 24, 2014, 11:40:28 pm
I finished some testing with a build that Calamity sent me to verify improvements to frame_delay, and the results look good.

First let me note my experience with XP64.  I was able to get some decent results at one point, but then found that I couldn't achieve this consistently.  The amount of lag seemed to vary randomly after each time I rebooted the PC.  I didn't troubleshoot this extensively.  It may have been fixable, but I didn't want to spend much more time on XP because my goal was to confirm the viability of 7 for myself and, if so, make the change to 7 permanently.  Your results may vary with XP.  The data points I used for XP represented the best I could gather from one continuous run in GM.

In any case, these tests were all performed in GM 155 with the game "Thunder Cross" at frame_delay 7.  The number of frames refers to frames of gameplay at 60Hz.  I collected 40 data points for each test to find an average.  I had about 75 data points for the first test in Windows 7, but found that the average of all the points was within 1% of the average of only the first 40 points, so I decided 40 would be enough as the standard for my tests.  A spreadsheet with the full results is attached, but I'll summarize here:

-GM 155-
XP64 - average: 2.200 frames, average with outliers removed: 2.044 frames.
7x64 - average: 2.025 frames

-GM 155_test-
7x64 - average: 1.913

The average with outliers removed for XP was determined by omitting occasional data points of 3 or 3.5 frames (6 or 7 of 120 fps video).  Notice that there is no such value for Windows 7.  This is because the delay NEVER exceeded 2.5 frames in Windows 7.  Windows 7 was much more stable all around.  I got the exact same behavior after restarting GM and the PC a bunch of times.  Again, your results with XP may vary, but I can at least vouch for quality in Windows 7.

Also notice that lag in the test version is even lower, "statistically."  I don't really know how important these averages actually are.  Given the low amounts of lag that Calamity is achieving (:applaud:), the amount of time that comes down to chance (at what point in the frame the button is pressed, the delay until the LED is captured by the camera, the timing offset between the camera and the CRT) becomes relatively high.  I think the most important thing that can be said about these results is that there is usually a visible on-screen response to the physical button input within 1.5-2 frames, 2.5 at most (in Windows 7), and that the new test build is at least as good if not better.

Setup:
ASUS HD4350 -> TC1600 -> KV-27FS120
Kade Encoder - USB Keyboard Mode

XP64
Core i3-2130 Sandy Bridge Dual-Core 3.5GHz
Asus P8Z68-M Pro
2x4GB DDR3 SDRAM @ 1333MHz

7x64
Core i3-4370 Haswell Dual-Core 3.8GHz
Asus Z97-M Plus
2x4GB DDR3 SDRAM @ 1600 MHz

*** rename attachment to .xlsx, I had to change the file extension to allow attachment ***
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on November 27, 2014, 11:18:40 am
Hi rCadeGaming,

Thanks a lot for your detailed tests. I'm happy to see these results being reproduced by other users. Thanks to your results I've decided to make that change into the "official" patch, now SwitchRes 0.015d. Hopefully this should also help with previous issues regarding analog controls.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: rCadeGaming on November 28, 2014, 09:06:47 am
Awesome, I'm updating to 156 now!  Glad that I could help.  Let me know if you need anything else tested  :cheers:
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: rCadeGaming on January 11, 2015, 09:55:41 pm
Did a little more lag testing today.  Whereas before all of my tests were run in native res, I did another test in a super resolution.  I got the result I was hoping for, which was that there isn't really any detectable lag penalty for using super resolutions.  This being the case, I don't see any reason not to use them.  I'm really loving super resolutions.

Next, I tried to squeeze a little bit more speed out of my computer by disabling a bunch of services that weren't necessary (mostly networking stuff).  Afterward, I was able to run Thunder Cross at an unthrottled average of ~3200%, meaning frame_delay 9 was rock-solid.  My previous tests had all used frame_delay 7 to play it safe for comparison across OS's.  Another lag test showed noticeable improvement.  See the results below.  I also attached an updated spreadsheet with the full results.

GM 155
7x64
native res
frame_delay 7
average: 2.025 frames

GM 155_test
7x64
native res
frame_delay 7
average: 1.913 frames

GM 155_test
7x64
super res
frame_delay 7
average: 2.025 frames

GM 155_test
7x64
super res
frame_delay 9
average: 1.763 frames


It's also worth mentioning that during 3 of the 40 samples taken at frame_delay 9, on-screen response was visible at only the second camera video frame (120fps) after the button press has started, which had never happened at frame_delay 7.  This is really getting down close to next gameplay frame response.

...and yes I know I need to upgrade to 157.  I don't get enough time for this stuff  :P

*** rename attachment to .xlsx, I had to change the file extension to allow attachment ***
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: koopah on February 10, 2015, 02:50:52 am
Hello,

I just joined this forum because i wanted to share with you how i achieved to setup a vsynced direct3d without additional input lag in mame 0.158.
As you mentioned earlier, the problem with mame is that Direct3D + waitvsync (which is probably the mode that lcd owners want to use the most) is causing noticeable input lag.

Since i dont have access to a 60 fps camera i did this test : i made a script to run mame either with (direct3d + waitvsync) or (directdraw + waitvsync, which is supposed to produce less lag) at random. Then i play the game for 10 sec and try to guess which mode was choosen based on the input lag i can feel. After that i compare my choice to the real mode printed in console output to see if my guess was correct.

I found out that almost 90% of the time my guess was correct which mean that the input lag is a real thing and is definitely a thing a player can feel. (I am talking about the additionnal 2-3 frames that Direct3D + vsync adds)

But the good news is : i think i found a workaround to fix the problem and achieve perfect vsync in direct3d without the additionnal direct3d lag.

In mame source code, vsync in Direct3d works with the "D3DPRESENT_INTERVAL_ONE" option. The problem with this option is that it creates a "render queue" which adds latency and creates lag. I found a fix by using the "Direct3d 9Ex" library (which is compatible with Direct3d so there is just a little source code modification) and using the method : "SetMaximumFrameLatency(1)", documented here : https://msdn.microsoft.com/en-us/library/windows/desktop/bb174347%28v=vs.85%29.aspx (https://msdn.microsoft.com/en-us/library/windows/desktop/bb174347%28v=vs.85%29.aspx).

With this fix, Direct3D + vsync works like a charm (perfectly vsynced to my 60hz lcd) and no noticeable input lag (i am not able to feel any additional lag, using my previous test).  :)

If you are interested in testing this fix, let me know. I can share a patch file so you can recompile mame yourself or send you my binary build which is only for windows 64 bits (btw this fix wont work for older windows version than Vista because of Direct3D 9Ex).

Additional notes : my setup is Intel G3420 CPU with integrated Intel HD GPU on windows 8.1 64 bits. The fix works perfectly on my system but i don't know if it will work on every graphics card (buggy drivers implementation for example).

Cheers
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: bulbousbeard on February 14, 2015, 09:32:55 am
Isn't that the same as just setting max prerendered frames in Nvidia's drivers to 1?

(http://i.imgur.com/rYOJ4uF.png)
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on February 14, 2015, 04:00:44 pm
If you are interested in testing this fix, let me know. I can share a patch file so you can recompile mame yourself or send you my binary build which is only for windows 64 bits (btw this fix wont work for older windows version than Vista because of Direct3D 9Ex).

Hi koopah, and thanks for sharing this. It'd be great if you posted your patch.

Yes, as you say the problem with D3D is that by default it creates a render queue. The length of this queue can be controlled either programatically using the API method you mentioned or forced to certain value with the specific tools provided by the manufacturer, like the one posted by bulbousbeard.

However, in Windows XP we don't have that API, so for D3D we've been using a manual call to GetRasterStatus followed by Present with the D3DPRESENT_INTERVAL_IMMEDIATE flag. This bypasses the frame queue, however it creates static tearing when there's significant scaling involved.

The alternative was DirectDraw, which can bypass the queue without adding tearing. Unfortunately DirectDraw is extremely slow on Windows 8, to the point of being totally unusable (I wouldn't be surprised if it was actually emulated by software or something).

So probably at some point we'll need to add this feature and apply it for Windows 7 and 8 where it's available, and finally have a "lag-less" D3D implementation without using hacks or resourcing to the deprecated DirectDraw (we also use DirectDraw currently for interlaced modes, we'll need to solve this though).
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: koopah on February 15, 2015, 07:42:20 am
Quote
Isn't that the same as just setting max prerendered frames in Nvidia's drivers to 1?
Maybe it can solve the issue, i can't test it because my GPU is an Intel one (no Nvidia drivers). But if it works that's great :)

Quote
It'd be great if you posted your patch.
I only modified 2 files in the source code :
/src/osd/windows/d3d9intf.c http://pastebin.com/U40riSEK (http://pastebin.com/U40riSEK)
/src/osd/windows/drawd3d.c http://pastebin.com/UxGxwemU (http://pastebin.com/UxGxwemU)
Simply replace the old contents with the new linked source code.

Let me know if it works for you (also please remember you need Direct3D 9Ex library which is only available on Windows Vista or newer).
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: koopah on February 17, 2015, 12:27:29 pm
I just reread your message,

Quote
Unfortunately DirectDraw is extremely slow on Windows 8, to the point of being totally unusable

Before running Mame with my "fixed" Direct3D patch, i used to run it using DirectDraw (my os is Windows 8.1 64 bits).
It was pretty fast. Running a game with "-nothrottle" option was even faster with DirectDraw than Direct3D (something like 1000% speed with DirectDraw and 700% speed with Direct3D on some random old games).

Are you sure that you are using DirectDraw and not GDI or software rendering? I'm really surprised that you found DirectDraw slow.

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Doozer on March 13, 2015, 01:59:30 pm

Input lag test under Linux with UME 0.159 and thunder cross rom (frame_delay = 0). I have 3-4 frames at 120 fps.

Game is 59.9 Hz, if I am not bad at math, I have 2 frames delays. I have used the ship movement to the next pixel row.

[Sorry for the extremely bad picture quality, I did not manage to get uncompress video out of my phone]
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Doozer on March 13, 2015, 02:26:09 pm
Video with better resolution.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on March 17, 2015, 05:19:42 am
Hi Doozer, thanks for your test.

Input lag test under Linux with UME 0.159 and thunder cross rom (frame_delay = 0). I have 3-4 frames at 120 fps.

Yeah this seems to confirm our other Linux tests. Whatever I tried, even writing directly to the primary buffer without flipping, I could never go down from a count of 5-6 at 120 Hz (around 3 real frames) although usually it was 7-8. Notice that for simplicity we include the first frame were the led lights up even if the real input lag would obviously exclude that frame (this means that getting a count of 2 at 120 Hz (1 real frame) means no lag).

If you compare these results to the most recent ones above by rCadeGaming in Windows 7 you see that, on average, input takes 2 frames more to be received in Linux than in Windows. This happened even when we tested with low latency kernels.

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Doozer on March 17, 2015, 06:55:18 am
Yeah this seems to confirm our other Linux tests. Whatever I tried, even writing directly to the primary buffer without flipping, I could never go down from a count of 5-6 at 120 Hz (around 3 real frames) although usually it was 7-8. Notice that for simplicity we include the first frame were the led lights up even if the real input lag would obviously exclude that frame (this means that getting a count of 2 at 120 Hz (1 real frame) means no lag).

If you compare these results to the most recent ones above by rCadeGaming in Windows 7 you see that, on average, input takes 2 frames more to be received in Linux than in Windows. This happened even when we tested with low latency kernels.

I will do the test on the kernel event level to see how the it reacts to inputs.

Is the lag observed identical under Windows with groovyume compiled using SDL library?
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on March 17, 2015, 07:01:22 am
Is the lag observed identical under Windows with groovyume compiled using SDL library?

GroovyMAME SDL builds don't work at all because the SDL side of the patch is not adapted to Windows yet.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: vicosku on July 16, 2015, 10:58:04 am
I just want to bump this thread and add some of the latest information on the topic that has been discussed in other threads.  I figure there are some people subscribed to this thread that haven't seen some of the others.

1.There appears to be no significant input lag discrepancy between Raw and Direct Input in GroovyMAME. (http://forum.arcadecontrols.com/index.php/topic,145174.msg1512836.html#msg1512836)  Note that this is only based upon my own testing with multiple devices so far.  One PCB using direct input was found to be slower than others.  Previous laggy direct input results were probably due to PCBs that are not lagless with this API, rather than the API itself.  Evidence of significant differences in response time for different PCBs can be found here (http://www.teyah.net/sticklag/results.html).  However, that testing has only been performed on consoles. While interesting, I suppose it is of limited use in this thread.

2. In the aforementioned thread, the CPS1 test screen, accessed by pressing F2 in a game like sf2 and then navigating to it, was chosen as a convenient way to test input lag.  Lately, this screen has been used to show that it is possible to get next-frame response in Groovymame more than 50% of the time at high frame_delay values.  Specific details can be found in the spreadsheets linked throughout the thread.

3. New Frame_Delay value information. (http://forum.arcadecontrols.com/index.php/topic,142143.msg1518723.html#msg1518723)
There's something critical I've found out: the frame_delay value is off by 1 unit of what's supposed to be. So if you use -frame_delay 8, it's effect is -frame_delay 9. If you use -frame_delay 9, it actually "wraps" to the next frame, so -fd 9 must not be used!

4. Multiple latency reduction and stability topics have been discussed (Such as the one above) and are being tested in the GM ASIO PRE-ALPHA thread. (http://forum.arcadecontrols.com/index.php/topic,142143.msg1471818.html#msg1471818)  Koopah's D3D9ex work has been tested as well.  I encourage you to review it if you are interested in the topic.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on July 18, 2015, 07:34:02 am
Thanks vicosku for recapping!

I just want to bump this thread and add some of the latest information on the topic that has been discussed in other threads.  I figure there are some people subscribed to this thread that haven't seen some of the others.

1.There appears to be no significant input lag discrepancy between Raw and Direct Input in GroovyMAME. (http://forum.arcadecontrols.com/index.php/topic,145174.msg1512836.html#msg1512836)  Note that this is only based upon my own testing with multiple devices so far.  One PCB using direct input was found to be slower than others.  Previous laggy direct input results were probably due to PCBs that are not lagless with this API, rather than the API itself.  Evidence of significant differences in response time for different PCBs can be found here (http://www.teyah.net/sticklag/results.html).  However, that testing has only been performed on consoles. While interesting, I suppose it is of limited use in this thread.

2. In the aforementioned thread, the CPS1 test screen, accessed by pressing F2 in a game like sf2 and then navigating to it, was chosen as a convenient way to test input lag.  Lately, this screen has been used to show that it is possible to get next-frame response in Groovymame more than 50% of the time at high frame_delay values.  Specific details can be found in the spreadsheets linked throughout the thread.

3. New Frame_Delay value information. (http://forum.arcadecontrols.com/index.php/topic,142143.msg1518723.html#msg1518723)
There's something critical I've found out: the frame_delay value is off by 1 unit of what's supposed to be. So if you use -frame_delay 8, it's effect is -frame_delay 9. If you use -frame_delay 9, it actually "wraps" to the next frame, so -fd 9 must not be used!

4. Multiple latency reduction and stability topics have been discussed (Such as the one above) and are being tested in the GM ASIO PRE-ALPHA thread. (http://forum.arcadecontrols.com/index.php/topic,142143.msg1471818.html#msg1471818)  Koopah's D3D9ex work has been tested as well.  I encourage you to review it if you are interested in the topic.

Given 4, it may be useful to revisit some of the input latency tests at a later stage (when gmasio is considered mature), to see whether the conclusions regarding 1 and 2 still hold.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on July 18, 2015, 07:48:40 am
Many thanks for doing this, Vicosku.

Adding an interesting latency test (http://filthypants.blogspot.com.es/2015/06/latency-testing.html) done by Hunter K. (RetroArch's dev).


Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on July 18, 2015, 09:05:34 am
Adding an interesting latency test (http://filthypants.blogspot.com.es/2015/06/latency-testing.html) done by Hunter K. (RetroArch's dev).

Interesting indeed.

The main takeaway for me is that we could possibly improve our own testing methods by also using the protoresistor method, instead of our current (camera) one.

The conclusions regarding aero and filters are obvious and commonly known. I don't know why he thinks GM is better under Linux than under Win64, it's actually the other way round if I understood right.

It's good to see GM come out well in the relative test though.  Would love to see him repeat the tests with a CRT.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: deadsoulz on July 18, 2016, 11:12:05 am
Koopah Anyway I could get a copy of your Build with this modification.  I am running mame on a LCD TV and seeing significant input lag.

I just reread your message,

Quote
Unfortunately DirectDraw is extremely slow on Windows 8, to the point of being totally unusable

Before running Mame with my "fixed" Direct3D patch, i used to run it using DirectDraw (my os is Windows 8.1 64 bits).
It was pretty fast. Running a game with "-nothrottle" option was even faster with DirectDraw than Direct3D (something like 1000% speed with DirectDraw and 700% speed with Direct3D on some random old games).

Are you sure that you are using DirectDraw and not GDI or software rendering? I'm really surprised that you found DirectDraw slow.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: vicosku on July 19, 2016, 11:54:36 am
Hi Deadsoulz.  The current version of GroovyMAME has progressed well beyond this thread.  It's been a while, but I'm almost certain Koopah's work and further improvements are included in GroovyMAME 0.171. 
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: deadsoulz on July 20, 2016, 02:57:06 pm
Hi Deadsoulz.  The current version of GroovyMAME has progressed well beyond this thread.  It's been a while, but I'm almost certain Koopah's work and further improvements are included in GroovyMAME 0.171.

Thanks, I know it is in Groovymame, but I am running a LCD TV and Groovymame always just white screens for me on this setup with NVIDIA graphics card.   I was hoping maybe someone had incorporated Koopahs changes into standard mame or mameuifx.

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: oomek on December 04, 2016, 08:14:22 pm
Allow me to dig out that old thread and update it with my recent lag measurements of GroovyMame 0.179 d3d9ex with always poll patch.
I used an Arduino board with 1ms polling rate and my 1200 fps camera.
My readings have precision of 0.8333 ms

Game I used: thunderx - service screen
renderer: d3d9ex
frame_delay: 9
Youtube link: https://www.youtube.com/watch?v=-zy6f4R5-80 (https://www.youtube.com/watch?v=-zy6f4R5-80)

(https://s11.postimg.org/i8uggrvdv/GM0_179d3d9ex_Always_Poll.png)

The results, especially the minimum frame delay is more than impressive.
 
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Paradroid on December 05, 2016, 04:13:41 am
Excellent report! Thanks for taking the time and sharing here.

Out of interest, what kind of CPU are you running to get such low latency?
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: oomek on December 05, 2016, 05:35:25 am
Excellent report! Thanks for taking the time and sharing here.

Out of interest, what kind of CPU are you running to get such low latency?

You will be surprised. It's an AMD A4 5000 1.5GHz (4 cores)
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: adder on December 05, 2016, 07:07:08 am
hi oomek thanks for your tests
coukd you post/attach your mame.ini file please
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: oomek on December 05, 2016, 07:28:58 am
hi oomek thanks for your tests
coukd you post/attach your mame.ini file please

Code: [Select]
#
# CORE CONFIGURATION OPTIONS
#
readconfig                1
writeconfig               0

#
# CORE SEARCH PATH OPTIONS
#
rompath                   D:\mame\roms
hashpath                  hash
samplepath                samples
artpath                   artwork
ctrlrpath                 ctrlr
inipath                   .;ini;ini/presets
fontpath                  .
cheatpath                 cheat
crosshairpath             crosshair
pluginspath               plugins
languagepath              language
swpath                    software

#
# CORE OUTPUT DIRECTORY OPTIONS
#
cfg_directory             cfg
nvram_directory           nvram
input_directory           inp
state_directory           sta
snapshot_directory        D:\mame\snap
diff_directory            diff
comment_directory         comments

#
# CORE STATE/PLAYBACK OPTIONS
#
state                     
autosave                  0
playback                 
record                   
record_timecode           0
exit_after_playback       0
mngwrite                 
aviwrite                 
wavwrite                 
snapname                  %g/%i
snapsize                  auto
snapview                  internal
snapbilinear              1
statename                 %g
burnin                    0

#
# CORE PERFORMANCE OPTIONS
#
autoframeskip             0
frameskip                 0
seconds_to_run            0
throttle                  0
syncrefresh               1
autosync                  0
sleep                     1
speed                     1.0
refreshspeed              1

#
# CORE RENDER OPTIONS
#
keepaspect                1
unevenstretch             1
unevenstretchx            0
unevenstretchy            0
autostretchxy             0
intoverscan               0
intscalex                 0
intscaley                 0

#
# CORE ROTATION OPTIONS
#
rotate                    1
ror                       0
rol                       0
autoror                   0
autorol                   0
flipx                     0
flipy                     0

#
# CORE ARTWORK OPTIONS
#
artwork_crop              1
use_backdrops             0
use_overlays              0
use_bezels                0
use_cpanels               0
use_marquees              0

#
# CORE SCREEN OPTIONS
#
brightness                1.0
contrast                  1.0
gamma                     1.0
pause_brightness          0.65
# effect                    scanline85.png
effect                    none

#
# CORE VECTOR OPTIONS
#
beam_width_min            1.0
beam_width_max            1.0
beam_intensity_weight     0
flicker                   0

#
# CORE SOUND OPTIONS
#
samplerate                48000
samples                   1
volume                    0

#
# CORE INPUT OPTIONS
#
coin_lockout              1
ctrlr                     
mouse                     0
joystick                  1
lightgun                  0
multikeyboard             0
multimouse                0
steadykey                 0
ui_active                 0
offscreen_reload          0
joystick_map              auto
joystick_deadzone         0.3
joystick_saturation       0.85
natural                   0
joystick_contradictory    0
coin_impulse              0

#
# CORE INPUT AUTOMATIC ENABLE OPTIONS
#
paddle_device             keyboard
adstick_device            keyboard
pedal_device              keyboard
dial_device               keyboard
trackball_device          keyboard
lightgun_device           keyboard
positional_device         keyboard
mouse_device              mouse

#
# CORE DEBUGGING OPTIONS
#
verbose                   0
log                       0
oslog                     0
debug                     0
update_in_pause           0
debugscript               

#
# CORE COMM OPTIONS
#
comm_localhost            0.0.0.0
comm_localport            15112
comm_remotehost           127.0.0.1
comm_remoteport           15112

#
# CORE MISC OPTIONS
#
drc                       1
drc_use_c                 0
drc_log_uml               0
drc_log_native            0
bios                     
cheat                     0
skip_gameinfo             1
uifont                    default
ui                        cabinet
ramsize                   
confirm_quit              0
ui_mouse                  1
autoboot_command         
autoboot_delay            0
autoboot_script           
console                   0
plugins                   1
plugin                   
noplugin                 
language                  English

#
# CORE SWITCHRES OPTIONS
#
modeline_generation       0
monitor                   custom
orientation               horizontal
connector                 auto
interlace                 1
doublescan                1
super_width               2560
changeres                 1
powerstrip                0
lock_system_modes         1
lock_unsupported_modes    1
refresh_dont_care         0
dotclock_min              0
sync_refresh_tolerance    2.0
frame_delay               9
vsync_offset              0
black_frame_insertion     0
modeline                  auto
ps_timing                 auto
lcd_range                 auto
crt_range0                63100.00-64100.00,50.00-65.00,0.759,1.241,2.000,0.016,0.047,0.503,0,1,768,1024,0,0
crt_range1                auto
crt_range2                auto
crt_range3                auto
crt_range4                auto
crt_range5                auto
crt_range6                auto
crt_range7                auto
crt_range8                auto
crt_range9                auto

#
# OSD KEYBOARD MAPPING OPTIONS
#
uimodekey                 SCRLOCK

#
# OSD FONT OPTIONS
#
uifontprovider            auto

#
# OSD OUTPUT OPTIONS
#
output                    auto

#
# OSD INPUT OPTIONS
#
keyboardprovider          auto
mouseprovider             auto
lightgunprovider          auto
joystickprovider          auto

#
# OSD DEBUGGING OPTIONS
#
debugger                  auto
debugger_font             auto
debugger_font_size        0
watchdog                  0

#
# OSD PERFORMANCE OPTIONS
#
numprocessors             auto
bench                     0

#
# OSD VIDEO OPTIONS
#
video                     d3d
numscreens                1
window                    0
maximize                  1
waitvsync                 0
monitorprovider           auto

#
# OSD PER-WINDOW VIDEO OPTIONS
#
screen                    auto
aspect                    auto
resolution                auto
view                      auto
screen0                   \\.\DISPLAY1
aspect0                   4:3
resolution0               1280x1024@0
view0                     auto
screen1                   auto
aspect1                   auto
resolution1               auto
view1                     auto
screen2                   auto
aspect2                   auto
resolution2               auto
view2                     auto
screen3                   auto
aspect3                   auto
resolution3               auto
view3                     auto

#
# OSD FULL SCREEN OPTIONS
#
switchres                 1

#
# OSD ACCELERATED VIDEO OPTIONS
#
filter                    1
prescale                  1

#
# OpenGL-SPECIFIC OPTIONS
#
gl_forcepow2texture       0
gl_notexturerect          0
gl_vbo                    1
gl_pbo                    1
gl_glsl                   0
gl_glsl_filter            1
glsl_shader_mame0         none
glsl_shader_mame1         none
glsl_shader_mame2         none
glsl_shader_mame3         none
glsl_shader_mame4         none
glsl_shader_mame5         none
glsl_shader_mame6         none
glsl_shader_mame7         none
glsl_shader_mame8         none
glsl_shader_mame9         none
glsl_shader_screen0       none
glsl_shader_screen1       none
glsl_shader_screen2       none
glsl_shader_screen3       none
glsl_shader_screen4       none
glsl_shader_screen5       none
glsl_shader_screen6       none
glsl_shader_screen7       none
glsl_shader_screen8       none
glsl_shader_screen9       none

#
# OSD SOUND OPTIONS
#
sound                     auto
audio_latency             1.0

#
# BGFX POST-PROCESSING OPTIONS
#
bgfx_path                 bgfx
bgfx_backend              auto
bgfx_debug                0
bgfx_screen_chains        default
bgfx_shadow_mask          slot-mask.png
bgfx_avi_name             auto

#
# WINDOWS PERFORMANCE OPTIONS
#
priority                  0
profile                   0

#
# WINDOWS VIDEO OPTIONS
#
menu                      0

#
# DIRECT3D POST-PROCESSING OPTIONS
#
hlslpath                  hlsl
hlsl_enable               0
hlsl_oversampling         0
hlsl_write                auto
hlsl_snap_width           2048
hlsl_snap_height          1536
shadow_mask_tile_mode     0
shadow_mask_alpha         0.0
shadow_mask_texture       shadow-mask.png
shadow_mask_x_count       6
shadow_mask_y_count       4
shadow_mask_usize         0.1875
shadow_mask_vsize         0.25
shadow_mask_uoffset       0.0
shadow_mask_voffset       0.0
distortion                0.0
cubic_distortion          0.0
distort_corner            0.0
round_corner              0.0
smooth_border             0.0
reflection                0.0
vignetting                0.0
scanline_alpha            0.0
scanline_size             1.0
scanline_height           1.0
scanline_variation        1.0
scanline_bright_scale     1.0
scanline_bright_offset    0.0
scanline_jitter           0.0
hum_bar_alpha             0.0
defocus                   0.0,0.0
converge_x                0.0,0.0,0.0
converge_y                0.0,0.0,0.0
radial_converge_x         0.0,0.0,0.0
radial_converge_y         0.0,0.0,0.0
red_ratio                 1.0,0.0,0.0
grn_ratio                 0.0,1.0,0.0
blu_ratio                 0.0,0.0,1.0
saturation                1.0
offset                    0.0,0.0,0.0
scale                     1.0,1.0,1.0
power                     1.0,1.0,1.0
floor                     0.0,0.0,0.0
phosphor_life             0.0,0.0,0.0

#
# NTSC POST-PROCESSING OPTIONS
#
yiq_enable                0
yiq_jitter                0.0
yiq_cc                    3.57954545
yiq_a                     0.5
yiq_b                     0.5
yiq_o                     0.0
yiq_p                     1.0
yiq_n                     1.0
yiq_y                     6.0
yiq_i                     1.2
yiq_q                     0.6
yiq_scan_time             52.6
yiq_phase_count           2

#
# VECTOR POST-PROCESSING OPTIONS
#
vector_beam_smooth        0.0
vector_length_scale       0.5
vector_length_ratio       0.5

#
# BLOOM POST-PROCESSING OPTIONS
#
bloom_blend_mode          0
bloom_scale               0.0
bloom_overdrive           1.0,1.0,1.0
bloom_lvl0_weight         1.0
bloom_lvl1_weight         0.64
bloom_lvl2_weight         0.32
bloom_lvl3_weight         0.16
bloom_lvl4_weight         0.08
bloom_lvl5_weight         0.06
bloom_lvl6_weight         0.04
bloom_lvl7_weight         0.02
bloom_lvl8_weight         0.01

#
# FULL SCREEN OPTIONS
#
triplebuffer              0
full_screen_brightness    1.0
full_screen_contrast      1.0
full_screen_gamma         1.0

#
# INPUT DEVICE OPTIONS
#
global_inputs             0
dual_lightgun             0
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dacasks on December 15, 2016, 01:58:45 pm
I did these tests early this year with a really crappy cellphone comparing actual hardware vs groovymame 0.171 Asio. Wonder if the new iterations since 0.171 improved.

https://www.youtube.com/watch?v=SF6r9nMj0y0 (https://www.youtube.com/watch?v=SF6r9nMj0y0)

(Couldn't figure out how to embend)

... I take the opportunity and ask if anyone knows how to make mame keep its CPU Clock settings saved (slider controls). It's a pain having to downclock it everytime in some game/systems to make on par with real hardware.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on December 16, 2016, 09:44:28 am
Awesome video! Thanks for sharing.

I'm wondering how it looks when you don't downclock it. Default cpu speed for cps2 is already 74% in stock MAME. What difference does it make that 4%? As long as vsync is enabled, that should be ruling the overall speed doesn't it?
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: oomek on December 16, 2016, 10:15:57 am
You see Calamity? On the vido there is a moment when the emu does not react. Can it be that missed human induced sub 16ms input trigger we were talking about :)?

0:06
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dacasks on December 16, 2016, 10:41:57 am
Hi, thanks for answering!

I think it would be better if I inform also other details about the setup used... I used a "lag free" usb board, it's not the chinese made one ("Zero delay", or something), a guy here in Brazil did it... and it's pretty good so far. And the buttons/stick are also hooked through an Db15 conector, to the jamma harness... I believe the lag input, with the usb part connected to the PC, I don't know if would be possible to have the same amount of lag than a wire connected directly to the jamma harness, and the board itself, which is really "raw", but, as soon as I did this test one day early this year, I was amazed on how it's so close to the original thing, to the point I sold a lot of board that I had. In fact, from whatIi recall, some games like konami games (I had a Vendetta board, TMNT, and I don't remember if I tested Simpsons as well), and, it was really really... you can't really see a difference because I think those games they have less native lag. Capcom games, specially fighting games, they often have some native input lag., so, it's more noticeable when added a little bit.

Now, the %.. I already heard some people, but I couldn't test this too much myself, that some games they are Voltage sensitive in the sense of speed. I don't know much about that, but I do know that people are unnacustomed with real speed on some games. I think the most clear example is Street Fighter 2 Hyper Fighting. Most emulators by the default runs the game EXTREMELY fast, since, forever. The only emulator that I know which is on par with real hardware, is the old Callus emulator. If someone wants to know the speed of the real thing, run callus and see it. Finalburns, Kawaks, nébulas, all these... and mame itself, it's too fast, like, fast, fast, It's not a little bit, it's fast fast, ridiculous fast. The Hyper Fighting speed is a little bit more than Champion Edition, not like, Turbo 3 Speed on Super 2x/Turbo.

So it was really by eye, really. Because.. I don't have all the... skill and... the electronics things to measure, but, I tried to reach a point where I could feel like I was playing my real boards. And 70% for cps1 and 2 games,I think it's good overall, but some games you can't really tell difference, for more, or less. Some exceptions are Ghouls n Ghosts, I can see some slowdowns in some parts, I never had a ghouls n ghosts original board, I did had a conversion, but, I needed to make more tests, but it's okay 100%, I like to use circa 74%, it get rids of these slowdowns in some heavy parts...
Carrier Air wing, the intro, in the original board, when the plane takes off and leaves the burning from the jet behind, in the original hardware it's really slow, like, heavy slowdown, and using the default 100% it makes the intro smoother, so it's also an example of wrong speed compared to the original hardware. At least from the boards that I have/had, running it through 5v.

And yep, some inputs they don't react right? By a really small margin.

(my english is terrible btw if write it too fast so, sorry! =D )
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on December 16, 2016, 11:31:09 am
You see Calamity? On the vido there is a moment when the emu does not react. Can it be that missed human induced sub 16ms input trigger we were talking about :)?

0:06

Exactly.

Hardcore Street Fighter players often complain some combos are harder to achieve or just impossible with MAME compared to the pcb. This could be the final confirmation (and the good news is we might already know the reason, it's a shame Dacasks sold his boards so we could abuse him asking for more tests).
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dacasks on December 16, 2016, 12:25:34 pm
:D
Well, I do have some boards yet. Actually… I think… yeah, I think I still have the same boards I used in the video. I do have… Sf2hf with the cpsdash motherboard, which runs at 12mhz oppose to the 8mhz of “vanilla” cps1 boards… But I also have a Carrier Air Wing, and a Unsquadron board… Magic Sword… mainly Capcom games because I like it. I have a Taito F3 with Darius Gaiden, I could try it as well… Capcom Zn boards, like Rival Schools, still have I guess, which runs really well too
But the setup would be the same, with the same crappy câmera, and that’s the problem I guess because… it’s really, how they say.. “ghetto”, even the setup itself, I just hook up the things in the most quick way and… But one thing that I could try maybe, one day, is see… I never actually , I heard about something of increasing the polling rate of USB, I didn’t that, as far as I remember, so I could try that too.… and of course using a newer version.
But yep, if there’s something special that I could try that I didn’t… with the same boards, let me know… But mostly boards, from what I tested (I was no… museum by the way, I had some few more boards than what I have now), it was really spot on, this clock stuff… and, it could be  a “Mame” thing right, maybe not video related anymore. Like, even the skippy inputs. Maybe the game it’s running a little bit faster, or slower, on the emulator, other things going on, I don’t know.

I tried shmupmame, and today with Gsyng/Freesync monitors, shmupmame it’s great because it syncs well… but, you can feel that, it’s even, faster than the real hardware like, really sensible, like, more lagfree than the real thing, to the point you start to understand why programmers they sometime add input lag for purpose, don’t know, it feels weird sometimes, and I heard that some games got broke in the process.  I do think Groovymame is the… closest… thing. And I don’t know if it reaches a plateau, the Mame part itself, when, there are other things involved. Even gaming programming, like, I already heard the Super Turbo and some revisions of other Street Fighter they have some high degree of randomness going on, like, bugs and… input flaws and stuff, so even the original hardware is skippy with these aspects, because of voltage, because of the hardware itself, like, electronics level stuff. For example, I already had two Night Warriors boards, with differents motherboards, and, at Jon Talbain stage, it had mad flickering, like, the lamps, and stuff, in the foreground and background? Even the characters and, since were two of them… I think it’s a game/hardware issue, right? Because on emulators it’s not like that, even decreasing the clock, it didn’t have that heavy flickering… so I guess there’s some kind of degree of accuracy “perfection” that… even real hardware can’t reach often sometimes. There are guys complaining about combos and stuff… but sometimes  it’s really not like playing on real hardware, and In my humble opinion it will never be, because even on real hardware, it’s not perfect sometimes depending on the setup, the electronics, voltage, and etc. But it’s the closest thing, and it’s great as it is already. I mean, think about how much money one would have to spend to play these games close to the real hardware, and these things sometimes are like bombs you know, they just stop, and boom, it’s over, no repair, no anything, so yeah, it’s a great service.
But let me know if I can do something to help sometime =)
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dacasks on December 16, 2016, 05:20:33 pm
Had the opportunity to try the most recent 0.180, and, the d3d one works great just like the 0.171 I guess. The only problem is the Asio support, I guess there isn't an Asio build right now? Regular Mame sound really comes off once you get used to the low latency audio of Asio.

Wish they improved the console drivers tho, guess they aren't focusing too much on it now. Still waiting to play PS1 Doom under Mame  (it's glitched :/). CPS2 romset changed as well, I guess... wonder if improved anything.

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: keilmillerjr on December 17, 2016, 12:48:03 pm
I was wondering the same thing about asio support? I am finally almost ready to actually install my mame build in a cab. Only took 3 years. Updated everything to 0.180 and they was like what happened to asio updates? Why isn't this sound driver included in base mame? Is it a hack?
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: intealls on December 17, 2016, 06:35:48 pm
Why isn't this sound driver included in base mame? Is it a hack?

There's a number of reasons. Firstly, with XAudio2, latency has been reduced somewhat.

Also, ASIO is not really compatible with the MAME license (GPL).

I have a semi-working PortAudio implementation that does everything the previous ASIO build did (low latency audio + sinc resampling for the final mix), but uses WASAPI as the main API instead. It's possible to build PortAudio with ASIO, but it would be illegal to redistribute this binary. If I find the time I could clean this up and post a patch/build.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Recapnation on December 19, 2016, 05:58:03 pm

So it was really by eye, really. Because.. I don't have all the... skill and... the electronics things to measure, but, I tried to reach a point where I could feel like I was playing my real boards. And 70% for cps1 and 2 games,I think it's good overall, but some games you can't really tell difference, for more, or less. Some exceptions are Ghouls n Ghosts, I can see some slowdowns in some parts, I never had a ghouls n ghosts original board, I did had a conversion, but, I needed to make more tests, but it's okay 100%, I like to use circa 74%, it get rids of these slowdowns in some heavy parts...
Carrier Air wing, the intro, in the original board, when the plane takes off and leaves the burning from the jet behind, in the original hardware it's really slow, like, heavy slowdown, and using the default 100% it makes the intro smoother, so it's also an example of wrong speed compared to the original hardware.

Could you elaborate a bit more on all that with better phrasing, please? I'm not sure if you do know that 74 % is "MAME default" for CP-S since you mention "100 %" many times, and if you really notice a difference when setting it at 70 % instead (from 74 %). Also, how's exactly your set-up (regarding control connection)? Tests like this are getting harder and harder to get and you could even get something useful to send to MAME Dev. CP-S 1 & 2 speed issues are well-known and it all depends on properly emulating wait states, it seems, but F-3 tests could well be a novelty.

Thanks!

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dacasks on December 20, 2016, 09:44:51 am
The setup  was just an arcade stick hooked both to the jamma harness (pcb) and to a usb interface (Groovymame), which so far has unnoticeable lag (probably not lag free, I suppose, but I don't have the tools to measure that)

Yep, I know that CPS 2 driver has 74% by default. The CPS 1 games as I said are totally off at 100%. But as I said... it's something really based on personal feeling, I don't have the tools to measure ms input and all of that stuff, if someone could point, I could maybe try something a next time

For example, Street Fighter Zero 3 using real hardware, if you pull a Shoryu Reppa with Ken, Level 3, the trailing shadows they produce some kind of slowdown, like, frame skipping on the real hardware. By using 74%, this slowdown doesn't happen on Mame, at least not often as the real hardware, which sometimes also doesn't happen depending on what is on screen, but's more often. By using 70% (I used to leave it on 68%, but by throwing Hadoukens at the same time, on real hardware and Groovymame, I realized that the speed of the game was a little bit off at 68%), the game behaves more like I feel it's the real hardware. But as I said, it's something purely based on personal feeling and on basic screen test, without any kind of deep measuring. I could be totally wrong, maybe 74% it's the right % or nearest to the real hardware, but, I just felt it was better. Besides, CPS1 and 2 they share similar hardware, so, if at 70% CPS1 behaves almost like real hardware, and 74% shows a little bit off, then... maybe 70% for both machines would be better.

I already saw some posts here and there in some forums, even reddit from what I remember... of people complaining about Street Fighter 2 Hyper Fighting speed, which is ridiculous compared to real hardware, and I already saw some mamedev kind of... almost, kind of "reluctant" to get into the issue more, I don't know why. Some people believe it's the 12mhz - 8mhz difference between the ordinary CPS 1 motherboard compared and the CPSDash (which original Hyper Fighting boards came equipped with), but, I already had the opportunity to try Hyper Fighting on a 8mhz board, and honestly, couldn't tell the difference at first, but maybe it has some with possible slowdowns, but, then again, it's another example of something that's going for ages now. And that's saying a lot because Capcom fighting games have a huge following, like, there are sites, like shoryuken and other forums that are dedicated most to capcom arcade fighting games, so, who am I to do something about it :D :D



Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dacasks on December 20, 2016, 09:59:06 am
Btw, tried Ghouls n Ghosts on Callus yesterday.

The slowdowns that I said are there, in some parts (most notably at the little "mountain" part with shooting flowers and those worm things raising off the ground at the end of stage 1 if lot's of action is on screen)

So... if Callus has the correct speed for most games, then, maybe its another example of 70% being more appropriate (at least for CPS 1 games)
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Recapnation on December 20, 2016, 12:17:36 pm
In case you haven't read the comments here and want to post your findings:

http://mametesters.org/view.php?id=408 (http://mametesters.org/view.php?id=408)

But as long as wait states aren't properly emulated, there's not much use. Input lag is another matter, though.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dacasks on December 26, 2016, 10:24:27 am
I'll think about it. Tried this week a little bit of SF2HF again... and yep, 70% it's the closest speed I could find to the original hardware, time clock speed, and everything... it's just a little bit off, I guess it has to do with non emulated features as you said.

I just realized that if you save state, the Clock% setting is also saved... so I made a hotkey combination with the Arcade Stick to Load/Save states whenever I want to play CPS games.

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: oomek on December 31, 2016, 01:34:31 pm
I've made a little picture visualising missing input events as it has been shown in the recently posted video.
What can be done is to delay the key_up event when there is another key_down event occurring in the same frame.
Regarding the button presses shorter than 1 frame: delay the key_up event  when key_down and key_up occurring in the same frame, but only when key_down is first, to avoid not releasing a button if the button is released and then pressed again within 1 frame.
Delaying just the key_up events unconditionally by 1 frame will not work I suppose, as it may add unnecessary delay in input processing.

(https://s27.postimg.org/9vfhj805v/poll_fix.png)
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: oomek on December 31, 2016, 01:41:35 pm
I could program some macros in Arduino emulating input sequences with transitions shorter than 1 frame to see how it behaves.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: lettuce on January 01, 2017, 10:46:36 am
Why isn't this sound driver included in base mame? Is it a hack?

There's a number of reasons. Firstly, with XAudio2, latency has been reduced somewhat.

Also, ASIO is not really compatible with the MAME license (GPL).

I have a semi-working PortAudio implementation that does everything the previous ASIO build did (low latency audio + sinc resampling for the final mix), but uses WASAPI as the main API instead. It's possible to build PortAudio with ASIO, but it would be illegal to redistribute this binary. If I find the time I could clean this up and post a patch/build.

That would be great!, so it would be as good a ASIO then?
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jimmer on January 13, 2017, 09:44:44 am
I've made a little picture visualising missing input events as it has been shown in the recently posted video.
What can be done is to delay the key_up event when there is another key_down event occurring in the same frame.
Regarding the button presses shorter than 1 frame: delay the key_up event  when key_down and key_up occurring in the same frame, but only when key_down is first, to avoid not releasing a button if the button is released and then pressed again within 1 frame.
Delaying just the key_up events unconditionally by 1 frame will not work I suppose, as it may add unnecessary delay in input processing.

(https://s27.postimg.org/9vfhj805v/poll_fix.png)

This is interesting because I've just started to program my encoders myself (also arduino).

I was wondering if we need to adjust the encoder for different joysticks because they have different sized diagonal zones. But I suppose the player adapts his technique to make the move register. ie the speed of execution down/downright/right determines the overlap on the input.

What's the tolerance on say streetfighter2 registering a move?  Does it need to see the necessary inputs on succesive frames or can you be slower than that.   eg does a poll of down/down/downright/downright/right  give you  whatever down/downright/right does?
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: oomek on January 13, 2017, 11:11:00 am
If for example down released and right pressed are registered in the same frame the downright won't be registered. The downright position even as long as 16ms may not be registered.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jimmer on January 13, 2017, 01:04:41 pm

I think it's clearer to should the polling as a discrete event:

This diagram also shows how lengthening the key release (ie a longer hold on the key) could spread say the input across more more polling points.  Hence my previous question about a poll of down/down/downright/downright/right
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: oomek on January 13, 2017, 03:29:53 pm
My graph is showing quantized key down/up events sent by the device while yours a logical key value for specific frame. Anyway on the bottom part you drew one pair of down-right too much.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jimmer on January 13, 2017, 05:09:31 pm

now I've thought some more, I understand your diagram even less  ;D

Is your POLL line the encoder polling the button, or the PC polling the USB encoder, or the game rom polling MAME ?







Maybe I don't get what you are trying to say.

When I think about polling it is the game polling the button states, which is usually/always(??) a once per frame discrete event. Certainly that's how some/most?? arcade games worked. How it is implemented in MAME is something I want to know more about.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: oomek on January 13, 2017, 05:32:18 pm
Is it more clear now?

(https://s23.postimg.org/xqzgoaf97/poll_fix2.png)

The yellow bars show when polling function must return keypress even if it's already up.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: oomek on January 13, 2017, 05:42:43 pm
for inputs shorter than 1 frame it's relatively easy to implement a fix, but for the scenario with not registering quarter circles is a bit complicated as the polling function has no way of knowing what game input the keyboard key is bound to. The fix to not introduce any unnecessary delays must be applied only when two neighbouring joystick directions overlap for a specific player.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: oomek on January 13, 2017, 05:45:00 pm
Btw, I still have no clue if input of hardware machines is interrupt or timer driven.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jimmer on January 13, 2017, 08:46:36 pm

That clarifies that you are talking about the game polling. 

So, we are I think, talking about altering the INPUT line by programming the encoder.

For the missed short button press we could program a minimum press time eg 18ms to cover all the 60Hz games. I don't think it would be a terrible idea, but it could spoil a legitimate input. eg in Defender it's theoretically possible to fire every other frame, to do that you need to be on-off-on at successive polls (17ms) so an imposed minimum press length could change your on-off-on into on-on-on which would be one shot instead of 2.  If you choose a minimum button press of exactly 1 frame then you would be very unlucky to get either a missed input or the failed double shot.

Ps. I have separate panels and encoders for Streetfighter and Defender so I can program them diffferently if desirable  ;D


You could look for diagonals and extend them to a minimum period easy enough.

Whether any of these schemes will be any good (or are cheating) is open to debate. Have you done any sampling of quarter circles to see what the debounced switch states look like?  it would be nice to know what typical overlaps look like as there might not actually be a problem.







Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: intealls on January 20, 2017, 05:23:47 pm
That would be great!, so it would be as good a ASIO then?

Sorry for the late reply, for all intents and purposes, it should be just as good!

Audio card rate estimation isn't done, which doesn't seem to matter much with WASAPI/WDM-KS. Also, there's no sinc resampling at the moment, but it would be very difficult to hear any difference due to this in most situations.

Edit: If you want to be properly anal, sinc resampling is of course "better", and I have a working sinc resampler implementation, it's just that (at the moment) it comes at a cost which is MUCH higher than the nearest neighbor approach taken in mainline. Even though it's written to make use of SSE/AVX, it will steal CPU cycles from the main thread (and definitely cripple frame_delay for instance). The sinc resampling with the ASIO patch was "for free", since it wasn't being done in the main thread. But like said previously it will be very difficult to hear any difference in the vast majority of titles. Also, it might even be possible to get rid of all humanly audible issues (however slight they may be) by simply increasing the sample rate.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: vicosku on February 02, 2017, 07:50:05 am
There's something critical I've found out: the frame_delay value is off by 1 unit of what's supposed to be. So if you use -frame_delay 8, it's effect is -frame_delay 9. If you use -frame_delay 9, it actually "wraps" to the next frame, so -fd 9 must not be used!

Hi, I've been out of the loop for at least several months and am updating to the latest GroovyMAME stuff.  It's nice to see all the great work that has been done by people like Calamity and Intealls.  Is the above still true, or have things been changed so that a value of 9 will actually reduce lag more than 8 without issues?  I can't seem to find any documentation regarding this point, though the answer may be hidden in a thread somewhere.  I figure this thread would be a decent place to add that information.  Thanks!
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: oomek on February 02, 2017, 11:30:50 am
I'm testing GM with Freesync and it's amazing latencywise. I get 0.45 frame minimum in D3D with HLSL on the LCD but in BGFX it's 1 frame more unfortunately. It would need a true fulscreen or dx12 iFlip to reduce it to match the D3D mode. Can we do something about it?
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: snappleman on February 13, 2017, 09:44:06 am
I was curious how the input/display lag really was with GM since it always felt nearly perfect to me. I compared it to my hardware Neo Geo MVS by using a custom controller I built that has both Neo Geo controller ports and an Ipac2 so I can use it on both systems simultaneously.

I left the GM settings at default, running a Radeon HD5450 with CRT_Emudriver and ASIO4ALL on a VGA CRT at 120hz. The Neo Geo was going into an LCD monitor (because my CRT tv died.. :( ) through an old Jrok video converter. The input response in GM was faster than that of the real hardware, which I know isn't surprising because it was going through an LCD which introduces lag, but I was still surprised that it was clearly faster in GM.

Now I got a new CRT TV and once I set it all up I'll do the test again for a more dependable result.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: lettuce on February 18, 2017, 04:45:17 pm
Will Windows 10 new Game Mode help reduce latency further?
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: oomek on February 18, 2017, 07:29:21 pm
It will not unfortunately. It only reserves more memory and cpu cycles to a fullscreen app. Anegdotically it even reduced the fps in some games. Since mame is fps limited it will do nothing to the lag.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: oomek on February 18, 2017, 07:31:36 pm
If you are running mame on lcd I would advise to go for freesync.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: snappleman on March 03, 2017, 04:47:32 pm
So I finally got to test GroovyMame vs Neo Geo MVS on a properly set up NTSC CRT TV, and GM has a noticeable input lag over the real thing. I have my framedelay at 7 and the games are running at the same refresh rate and resolution. Also running on a 120hz VGA CRT the results are the same. The only thing I can think of that I can change is uninstalling Windows 8.1 and going to Windows 7, hopefully that will improve things a bit.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dacasks on November 06, 2017, 10:51:09 am
Just bumping this old thread because, even if someone probably did already another test since then... I managed to get a beat up monitor and time to hook up the jamma setup, again. (mainly to sell stuff, actually, and play some Samsho 64)
So I did another test, now with Groovy 190,  but now with an extra fine twist:
Instead of hooking it through USB, I’ve wired the arcade stick through a LPT port (using the PCI slot)
Same thing: Left Real PCB, Right Groovymame 190 running on a I5 OC to 4.4, Win7, with frame_delay set to 5, just for the heck of it (even if, in some tests, frame_delay 1 gave me noticeable more lag).

It has 5 minutes of Ryu giving head to Ken in outerspace, but it’s Worth of it.

https://youtu.be/70t8D8Ee3bw
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: oomek on November 06, 2017, 04:20:40 pm
Would you share some links explaining how to configure mame to get the input from the LPT port? I didn’t even know it’s possible and I would like to give it a go.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dacasks on November 07, 2017, 04:58:22 am
Would you share some links explaining how to configure mame to get the input from the LPT port? I didn’t even know it’s possible and I would like to give it a go.

Well, one option would be to convert a PS1 controller http://www.raphnet.net/electronique/psx_adaptor/psx_adaptor_en.php (http://www.raphnet.net/electronique/psx_adaptor/psx_adaptor_en.php) Or, you can use this tutorial also which covers making an interface from the ground (http://bitenkof98.blogspot.com.br/2013/07/lpt-switch-o-que-e-e-como-usar.htm (http://bitenkof98.blogspot.com.br/2013/07/lpt-switch-o-que-e-e-como-usar.htm)), it's in portuguese but, it can be translated and has pictures and everything...
You can search for an already made PS2 > LPT adapter as well.

If your mobo doesn't have a LPT port (or pins to hook one), you can use a PCI Express LPT port.

On the software side, you just need PPJOY https://drive.google.com/file/d/0B8n2DGNd2UIUQmpNSXVYdmZEaHc/view (https://drive.google.com/file/d/0B8n2DGNd2UIUQmpNSXVYdmZEaHc/view)
It Works under WIN 10 as well

It has several options inside its control panel depending on the input you have, but it basically routes the inputs to a virtual joystick. It Works great, even with new games and everything.  I can't make the old Groovymame Asio work with it though, maybe my groovymame setup is broken I don't know.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: intealls on December 18, 2017, 03:31:20 pm
Just did a test run with 1000 Hz USB polling (with the driver somewhere in this thread, http://www.overclock.net/t/1597441/digitally-signed-sweetlow-1000hz-mouse-driver (http://www.overclock.net/t/1597441/digitally-signed-sweetlow-1000hz-mouse-driver) ). Tried fixing the controller to 31 Hz to make sure the driver worked as intended, which made next frame response impossible.

Game was sf2 with GM 0.192 d3d9ex, frame delay 9. The controller was a Hori Real Arcade Pro V3-SA.

I used a scope to see when vblank occured and when the button was pressed. Next frame responses could be observed down to about 2.5 ms before vblank!

Frame delay 9 would should give about 1.68 ms to emulate the frame with sf2, along with the 1 ms poll rate (actual delay would probably be 0 <= 1 ms).

I never observed any next-frame responses below 2.5 ms in the recording.

In summary, upping the poll rate of the controller should lead to more next-frame responses. The default 125 Hz poll rate gives a granularity of 8 ms. With a frame time of 16.76 ms, we would get one poll at 8 ms, the second at 16 ms, which is past the deadline of ~15.084 ms (16.76-1.68). Thus, we would only get next frame response if a button was pressed within 0 to 8 ms (if perfectly aligned with vblank).

A picture of the setup is attached, if anyone wants the videos I could post them somewhere.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on December 19, 2017, 10:33:04 am
Thanks Dacasks and intealls for these new tests. Really exciting results!

When I have some time I'll redo my tests on my rebuilt system. Last time I tested I was still running GM on XP 64.

I take the opportunity to comment that there's still a big amount of skepticism concerning our results if you look around the web.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jimmer on December 19, 2017, 01:30:45 pm
Good stuff intealls.

Do you have any tips on installing the sweetlow driver?  I can't get it to do anything on either my win7 noSP or my win7 SP1 machines.

Running the setup doesn't change the mouse poll rate. (I don't have a way to test the keyboard or my keyboard encoder.) Attempted direct driver update reports back that driver doesn't need updating.

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: intealls on December 19, 2017, 02:56:57 pm
Do you have any tips on installing the sweetlow driver?  I can't get it to do anything on either my win7 noSP or my win7 SP1 machines.

Make sure you use Test Mode. First you need to install the driver and SweetLow.CER (double click on it). After installing the cert, run 1kHz.cmd, right click on HIDUSBF.INF and choose install. You may or may not need to reboot.

Then launch setup.exe, select your mouse, select "Filter on device", "Install service" and choose the desired poll rate. Click "Restart". Now use mouserate.exe to see if the change took. If the mouse won't seem to go any higher than 125 Hz, try setting 62 Hz or 31 Hz. If the update rate is at 62 or 31 Hz, the driver is working as intended, and the mouse might be limited to a 125 Hz update rate.

Then do the same for your controller.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: intealls on December 19, 2017, 03:18:54 pm
Thanks Dacasks and intealls for these new tests. Really exciting results!

No worries whatsoever. Thank you for the incredibly awesome CRT Emudriver v2, which will keep our lovely CRTs in use for years to come. :) And GM on state-of-the-art hardware as well! The entire project is just awesome.

I manually measured the response of 87 button presses from the recording. About 84% gave next-frame response. The best case was about 2.5 ms before vblank, but there were responses up to about 3 ms that didn't give next-frame response, so I need to look into it some more. Everything over about 3.4 ms gave next-frame response.

Code: [Select]
2110 1 6.5 RIGOL_D3D9EX_1000HZ_POLL
2324 1 5.2
2544 1 8
2762 1 4.5
2974 1 13
3187 0 1
3400 1 3.7
3618 0 1
3830 1 4
4046 1 9.2
4264 1 6
4482 1 3.7
4683 1 5.2
4896 1 8
5109 1 12
5326 1 15.5
5537 1 9
5757 1 14
5971 1 12
6181 1 5.5
6393 1 14
6607 1 7.2
6828 1 13.2
7037 1 13.7
7254 0 1.2
7473 1 12
7702 1 13
7904 1 11.5
8136 0 0.8
8358 1 15
8570 1 2.8
8781 1 13
8986 1 15
9206 1 4.2
9416 1 4
9646 0 2.8
9862 1 4
10087 1 8
10307 1 13.5
10525 1 11.2
10750 1 11
10975 1 15.5
11193 1 15.8
11409 1 5.2
11630 1 6.2
11844 0 1.5
12050 0 1.2
12260 1 15.5
12464 0 2.4
12679 1 9
12898 1 6.2
13121 1 3.4
13342 0 3.2
13560 0 1.6
13770 1 15.2
13998 1 5.2
14210 1 11.2
14412 0 2.8
14616 1 11.2
14835 1 7.2
15047 1 16.76
15264 1 16.76
15489 0 1.2
15705 1 3.5
15919 1 3.7
16144 1 7.5
16363 1 15
16582 1 10.5
16810 1 15.8
17025 1 14.5
17248 1 16
17458 0 1.4
17678 1 10
17886 1 15
18105 1 8
18318 1 7.2
18524 1 5
18733 1 4.8
18959 1 8.5
19168 1 11.5
19375 0 0.8
19593 1 16.76
19805 1 3.4
20028 1 15.5
20248 1 2.5
20477 1 3.9
20694 1 7.7
0.839080459770115
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jimmer on December 19, 2017, 09:21:33 pm

Make sure you use Test Mode. First you need to install the driver and SweetLow.CER (double click on it). After installing the cert, run 1kHz.cmd, right click on HIDUSBF.INF and choose install. You may or may not need to reboot.

Then launch setup.exe, select your mouse, select "Filter on device", "Install service" and choose the desired poll rate. Click "Restart". Now use mouserate.exe to see if the change took. If the mouse won't seem to go any higher than 125 Hz, try setting 62 Hz or 31 Hz. If the update rate is at 62 or 31 Hz, the driver is working as intended, and the mouse might be limited to a 125 Hz update rate.

Then do the same for your controller.

Thanks for that. I got it working on the mouse now (at 62hz) it won't go above 125hz.

I use an arduino encoder, so when I get a chance I'll test it by programming some 500hz keypresses.


Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on December 20, 2017, 08:25:53 am
Hi intealls,

Thanks for sharing your research and results  :)

I used a scope to see when vblank occured and when the button was pressed. Next frame responses could be observed down to about 2.5 ms before vblank!

Your testing setup is intriguing  :). Could you possibly explain a bit more in detail how your testing setup works, it's different from the LED wired to joystick + 240fps camera setup right?  If so, how does it account for possible video frame render queues or total USB stack latency?

Just did a test run with 1000 Hz USB polling (with the driver somewhere in this thread, http://www.overclock.net/t/1597441/digitally-signed-sweetlow-1000hz-mouse-driver (http://www.overclock.net/t/1597441/digitally-signed-sweetlow-1000hz-mouse-driver) ). Tried fixing the controller to 31 Hz to make sure the driver worked as intended, which made next frame response impossible.

I guess it's good to mention that the sweetlow "driver" is a hack that "tampers" with the USB Endpoint protocol in Windows. Depending on your hardware there may be benefits, there may be disadvantages...

Windows and USB Endpoint structure

It's a myth that Windows handles all USB devices at a polling interval of 8ms.

I'll elaborate a bit on this as we're all freaks right  :D

Windows has a very clear protocol on how it handles USB devices. The speed of the device together with the bInterval value that is set in hardware of the endpoint device configures the polling interval in Windows. Refer to USB_ENDPOINT_DESCRIPTOR structure (https://msdn.microsoft.com/en-us/library/windows/hardware/ff539317(v=vs.85).aspx)

As can be seen from the three tables on the page, the minimum polling period for low speed USB devices is 8ms. Full speed devices may configure 1ms as minimum polling interval and high-speed devices configure it in "microframes", the minimum being 1/8th of 1ms.

Basicly any quality modern gaming endpoint device is configured to be a full-speed device and uses either 1ms or 4ms polling interval. In most cases it makes no sense using the sweetlow hack on these devices.
 
If you're interested in the interface speed of your Hori stick, you could download the tool USBView (https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/usbview) from Microsoft and look up the mentioned values for your joystick. Since you seem to experience a benefit from using the hack, my guess is that it's a low speed device.

To put this into perspective the I-PAC 2 (http://"https://www.ultimarc.com/ipac1.html") is a full speed device that has a bInterval value that configures it at 4ms polling period. The new XBox One controllers when used on a PC also operate at 4ms polling period. The 2600-daptor runs at 1ms polling interval.

The potential disadvantage with the sweetlow hack is that there is no secure way to determine whether or not the (low-speed) USB joystick hardware really supports the higher polling rate. With a mouse you can use mousetester, but there is no such thing for testing joysticks. When you're applying the sweetlow hack with a device where the hardware does not really support such high polling interval, it may result in erratic behaviour and degrade the performance overall.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: krick on December 20, 2017, 10:15:56 am
Has any lag testing been done like this with the PS/2 interface?

I have an older JPAC interface that connects through the PS/2 keyboard port.  I'm curious if it has better/same/worse results than the USB IPAC interface.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: intealls on December 20, 2017, 12:10:37 pm
Hi Dr.Venom, long time no hear!

Your testing setup is intriguing  :). Could you possibly explain a bit more in detail how your testing setup works, it's different from the LED wired to joystick + 240fps camera setup right?  If so, how does it account for possible video frame render queues or total USB stack latency?

I've improved the setup since the last post, I can now actuate the button deterministically at a predetermined time from vblank.

It's quite simple - an MCU (Teensy) is monitoring vblank (from the UMSA), and pulls the button low on the RAP for 125 ms, x ms after vblank. x is configurable to any delay. Runtime operation is monitored by the oscilloscope.

Here's a timing diagram.

Code: [Select]
_____   ____________________   ___   ____
     |_|                    |_|   ...
       ^ vblank
       |-----------------|
                         ^ x
_________________________             ___
                         |________...|
                         ^ button actuation by MCU

I can post some pictures and a better rundown later on.

Other than that, the testing procedure is the same as previous endeavours (240 fps camera + LED). So next-frame response still need to be determined manually by looking at the video. It's quite easy with this setup though, since the button actuation interval is deterministic, a fixed number of video frames can be present between actuations. This makes it possible to quickly fast-forward between actuations.

USB stack latency should be included in the measured delay, since it's a system test. Determining what the USB stack latency is though, that's another story. :)

I guess it's good to mention that the sweetlow "driver" is a hack that "tampers" with the USB Endpoint protocol in Windows. Depending on your hardware there may be benefits, there may be disadvantages...

Windows and USB Endpoint structure

It's a myth that Windows handles all USB devices at a polling interval of 8ms.

...

Thanks for the USB rundown and USBView tip! Really helpful.

Here's the descriptor from the Hori stick, the MSDN descriptor page, in conjunction with the information from USBView seems to suggest that the device in fact uses a standard 8 ms polling rate (bInterval = 0x0A and Full Speed).

Code: [Select]
Is Port User Connectable:         yes
Is Port Debug Capable:            no
Companion Port Number:            0
Companion Hub Symbolic Link Name:
Protocols Supported:
 USB 1.1:                         yes
 USB 2.0:                         yes
 USB 3.0:                         no

Device Power State:               PowerDeviceD0

       ---===>Device Information<===---
English product name: "REAL ARCADE Pro.V3"

ConnectionStatus:                 
Current Config Value:              0x01  -> Device Bus Speed: Full (is not SuperSpeed or higher capable)
Device Address:                    0x13
Open Pipes:                           2

          ===>Device Descriptor<===
bLength:                           0x12
bDescriptorType:                   0x01
bcdUSB:                          0x0200
bDeviceClass:                      0x00  -> This is an Interface Class Defined Device
bDeviceSubClass:                   0x00
bDeviceProtocol:                   0x00
bMaxPacketSize0:                   0x40 = (64) Bytes
idVendor:                        0x0F0D = HORI CO., LTD.
idProduct:                       0x0022
bcdDevice:                       0x1000
iManufacturer:                     0x01
     English (United States)  "HORI CO.,LTD."
iProduct:                          0x02
     English (United States)  "REAL ARCADE Pro.V3"
iSerialNumber:                     0x00
bNumConfigurations:                0x01

          ---===>Open Pipes<===---

          ===>Endpoint Descriptor<===
bLength:                           0x07
bDescriptorType:                   0x05
bEndpointAddress:                  0x02  -> Direction: OUT - EndpointID: 2
bmAttributes:                      0x03  -> Interrupt Transfer Type
wMaxPacketSize:                  0x0040 = 0x40 bytes
bInterval:                         0x0A

          ===>Endpoint Descriptor<===
bLength:                           0x07
bDescriptorType:                   0x05
bEndpointAddress:                  0x81  -> Direction: IN - EndpointID: 1
bmAttributes:                      0x03  -> Interrupt Transfer Type
wMaxPacketSize:                  0x0040 = 0x40 bytes
bInterval:                         0x0A

       ---===>Full Configuration Descriptor<===---

          ===>Configuration Descriptor<===
bLength:                           0x09
bDescriptorType:                   0x02
wTotalLength:                    0x0029  -> Validated
bNumInterfaces:                    0x01
bConfigurationValue:               0x01
iConfiguration:                    0x00
bmAttributes:                      0x80  -> Bus Powered
MaxPower:                          0xFA = 500 mA

          ===>Interface Descriptor<===
bLength:                           0x09
bDescriptorType:                   0x04
bInterfaceNumber:                  0x00
bAlternateSetting:                 0x00
bNumEndpoints:                     0x02
bInterfaceClass:                   0x03  -> HID Interface Class
bInterfaceSubClass:                0x00
bInterfaceProtocol:                0x00
iInterface:                        0x00

          ===>HID Descriptor<===
bLength:                           0x09
bDescriptorType:                   0x21
bcdHID:                          0x0111
bCountryCode:                      0x00
bNumDescriptors:                   0x01
bDescriptorType:                   0x22 (Report Descriptor)
wDescriptorLength:               0x0089

          ===>Endpoint Descriptor<===
bLength:                           0x07
bDescriptorType:                   0x05
bEndpointAddress:                  0x02  -> Direction: OUT - EndpointID: 2
bmAttributes:                      0x03  -> Interrupt Transfer Type
wMaxPacketSize:                  0x0040 = 0x40 bytes
bInterval:                         0x0A

          ===>Endpoint Descriptor<===
bLength:                           0x07
bDescriptorType:                   0x05
bEndpointAddress:                  0x81  -> Direction: IN - EndpointID: 1
bmAttributes:                      0x03  -> Interrupt Transfer Type
wMaxPacketSize:                  0x0040 = 0x40 bytes
bInterval:                         0x0A

The potential disadvantage with the sweetlow hack is that there is no secure way to determine whether or not the (low-speed) USB joystick hardware really supports the higher polling rate. With a mouse you can use mousetester, but there is no such thing for testing joysticks. When you're applying the sweetlow hack with a device where the hardware does not really support such high polling interval, it may result in erratic behaviour and degrade the performance overall.

I wasn't aware of the fact that modern devices actually set a faster polling rate. Maybe the proper thing to do would be to find an encoder that operates at 1 kHz, and not use the SweetLow hack!

I found this page of someone using Joy2Key to test the polling rate (http://forums.shoryuken.com/discussion/181133/how-can-i-check-the-usb-polling-rate-of-my-stick), but haven't tested it.

Edit: Tested this, and it doesn't work.

I'll elaborate a bit on this as we're all freaks right  :D

This is irrefutably the case! Albeit in a good way.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: intealls on December 20, 2017, 03:07:22 pm
OK, so I did a few runs.

I set the button actuation to about 3.6 ms from vblank (window is about 1.9 ms), and set the polling rate to Default, 500 Hz and 1000 Hz using the SweetLow driver and logged the result. The timing can be seen in the attached image. I started counting from about frame 250 and onwards.

1000 Hz: 51 pass, 6 fail, 89% next frame response.
 500 Hz: 21 pass, 36 fail, 37% next frame response.
Default: 57 fail, 0% next frame response.

So even though it's still unknown if the Hori in actuality handles 1000 Hz polling in a graceful manner, it performs a lot better with 1000 Hz polling set. With multiple button presses and the like, who knows. Maybe it'll choke.

I uploaded the videos to here: mega.nz (https://mega.nz/#F!yApSRaiY!63VXX6RJJhx5KlyAGKcflQ).

I use VLC to convert the videos to avi (MJPEG) and then VirtualDub with ffdshow to view them, which allows very fast scrubbing.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on December 20, 2017, 03:53:28 pm
Hi Dr.Venom, long time no hear!

Yes it's been a while, good to see you around also :)

I've improved the setup since the last post, I can now actuate the button deterministically at a predetermined time from vblank.

It's quite simple - an MCU (Teensy) is monitoring vblank (from the UMSA), and pulls the button low on the RAP for 125 ms, x ms after vblank. x is configurable to any delay. Runtime operation is monitored by the oscilloscope.

Other than that, the testing procedure is the same as previous endeavours (240 fps camera + LED). So next-frame response still need to be determined manually by looking at the video. It's quite easy with this setup though, since the button actuation interval is deterministic, a fixed number of video frames can be present between actuations. This makes it possible to quickly fast-forward between actuations.

Ah now I understand, thanks for explaining. Great setup.

OK, so I did a few runs.

I set the button actuation to about 3.6 ms from vblank (window is about 1.9 ms), and set the polling rate to Default, 500 Hz and 1000 Hz using the SweetLow driver and logged the result. The timing can be seen in the attached image. I started counting from about frame 250 and onwards.

1000 Hz: 51 pass, 6 fail, 89% next frame response.
 500 Hz: 21 pass, 36 fail, 37% next frame response.
Default: All fail, 0% next frame response.

So even though it's still unknown if the Hori in actuality handles 1000 Hz polling in a graceful manner, it performs a lot better with 1000 Hz polling set. With multiple button presses and the like, who knows. Maybe it'll choke.

I uploaded the videos to here: mega.nz (https://mega.nz/#F!yApSRaiY!63VXX6RJJhx5KlyAGKcflQ).

I use VLC to convert the videos to avi (MJPEG) and then VirtualDub with ffdshow to view them, which allows very fast scrubbing.

That's a pretty awesome finding. Thanks for sharing, I'll take a look at the videos later!

The below got a bit longer than I anticipated but since it may have some relevance to the topic on hand..

Some time ago I did some more elaborate testing on the latency of WinUAE versus real Amiga (which I still own). With the help of the WinUAE author a small application for the Amiga side was made which allows to set the rasterline where the Amiga polls for input and then shows a color on screen after vsync when a button is pressed (Very tiny Amiga DOS program that takes rasterline number as input :P). I.e. you can have the Amiga side poll early or late in the frame and see how it affects the latency measurements on the Windows host.

One of the takeaways was that WinUAE when compared like for like with the real hardware shows a lag of 1.2 frames when the Amiga side acquires input early in the frame and about 1.8 frames when it acquires input late in the frame. WinUAE's D3D9Ex implementation as far as I know is quite similar to that of GM, the main difference being that WinUAE doesn't have a framedelay feature like GM.

I guess the above may have similar implications for GM, especially when there is "some" inherent lag on the host (USB stack?) this can become a factor.* (Framedelay cannot fully compensate, especially when game polls late in the frame). So depending on which rasterline an Arcade game reads its input, it will be easier or more difficult to match the response with emulation. It may also imply that if we find next frame response for one game, that it doesn't have to be the same for all games.

*If I remember correctly some tests done by Calamity way back suggested something like half a frame of inherent host lag (next frame response was only seen when LED active and rasterbeam had not yet crossed 1/3rd of the screen.. but correct me if I'm wrong..)

In case you're interested, a (very long) thread spawned from it on the EAB board: Input latency measurements (and D3D11), see here: http://eab.abime.net/showthread.php?t=88777 (http://eab.abime.net/showthread.php?t=88777).

The WinUAE testing brought up an interesting other point, which has to do with a Microsoft comment that mentioned that in most situations there is 1 frame of inherent video latency in Windows applications (this may be interesting to Calamity especially). With Windows 8.1 a new feature is introduced called "waitable swap chains" that has the potential to implement next frame response to input, see this post out of the earlier mentioned thread specifically: http://eab.abime.net/showpost.php?p=1188236&postcount=19 (http://eab.abime.net/showpost.php?p=1188236&postcount=19)

Quote
How does waiting on the back buffer reduce latency?

With the flip model swap chain, back buffer "flips" are queued whenever your game calls IDXGISwapChain::Present. When the rendering loop calls Present(), the system blocks the thread until it is done presenting a prior frame, making room to queue up the new frame, before it actually presents. This causes extra latency between the time the game draws a frame and the time the system allows it to display that frame. In many cases, the system will reach a stable equilibrium where the game is always waiting almost a full extra frame between the time it renders and the time it presents each frame. It's better to wait until the system is ready to accept a new frame, then render the frame based on current data and queue the frame immediately.

It's part of the reason why the thread derailed into in a DGXI/D3D11 thread  :), which Toni is currently implementing into WinUAE. Time will tell, once the low latency vsync stuff is implemented, whether it will result in shaving off another frame of latency compared to its (WinUAE's) D3D9Ex implementation.

Here's the descriptor from the Hori stick, the MSDN descriptor page, in conjunction with the information from USBView seems to suggest that the device in fact uses a standard 8 ms polling rate (bInterval = 0x0A and Full Speed).

Right, that's a configuration seen on many "normal" (non gaming) USB2 input devices.

I wasn't aware of the fact that modern devices actually set a faster polling rate. Maybe the proper thing to do would be to find an encoder that operates at 1 kHz, and not use the SweetLow hack!

Sure that seems like the best solution. Not easy to find though as there is some nuance to modern devices setting a faster polling rate. It seems to be the modern (expensive) gaming gear that are almost without exception full speed devices with 1 ms polling (or configurable as 1, 2, 4 or 8ms via a hardware switch, like e.g. my Corsair K70 gaming keyboard). But opposite to that is the category of modern "cheap" gear, like most of the gamepads and joystick / gamepad adapters from China, which sadly are mostly low speed 8ms, or worse...

I'll elaborate a bit on this as we're all freaks right  :D

This is irrefutably the case! Albeit in a good way.

Definitely. It's great to have people apply a little science to these topics which have been so obscure for many years.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on December 21, 2017, 05:32:23 am
Quote
*If I remember correctly some tests done by Calamity way back suggested something like half a frame of inherent host lag (next frame response was only seen when LED active and rasterbeam had not yet crossed 1/3rd of the screen.. but correct me if I'm wrong..)

Yes, but my tests were done on an old Core2Duo running XP 64. We suggested back then that the faster the CPU, the smaller the host system lag would be, as later tests on more powerful hardware are confirming. Intealls' results (3.6 ms from vblank) show next frame response with the beam having crossed 4/5 of the screen. Eventually, as hardware gets faster, host system lag should be close to zero.

The breakthrough was to prove that next frame response is possible in emulation, conceptually and empirically, something that was denied at the time and is still questioned today.


Quote
(Framedelay cannot fully compensate, especially when game polls late in the frame). So depending on which rasterline an Arcade game reads its input, it will be easier or more difficult to match the response with emulation.

While I understand and find your results with Amiga logical*, I think this statement is not true for (Groovy)MAME. In general, it doesn't matter on which rasterline an arcade game reads its input, because emulation time will be compressed to a fraction of the frame time, instead of spreading across the full duration of the frame time. So, provided you have the most recent input polled right before emulating the frame, and the closest possible to vblank with a high frame delay value (ideally we should do the whole emulation during the vblank interval) the emulated arcade game is guaranteed to have the most recent input regardless of the scanline it polls input.

* In my understanding, your Amiga experiment involves two things:

1.- The Amiga emulator polls host input at any point that the emulated software polls input. MAME doesn't work that way, all input is polled and buffered prior to emulating next frame.

2.- The time spent by the Amiga emulator to emulate a frame is not negligible (at least it's long enough that the polling point makes a difference). With GroovyMAME and high frame delay we're assuming the emulation time is negligible.


However, there are two scenarios I can think of where the GroovyMAME model won't be able to match real hardware, by design:

1.- If a game can poll input at any point in a frame and make action happen right in that frame (e.g. make the character move on the lower half of the screen). Although we have no knowledge of any game doing this, it's certainly possible that real hardware did that and GroovyMAME definitely can't.

2.- Respond to button presses shorter than a frame period. Oomek made some tests that show this is not possible with current implementation. A button press must cross a frame boundary to get MAME aknowledge it. This might be the reason why combos in fight games are more difficult to achieve than on real hardware. An adhoc workaround for this might be possible but testing its effectiveness requires a very specific setup.


Quote
It's better to wait until the system is ready to accept a new frame, then render the frame based on current data and queue the frame immediately.

Certainly, those waitable swap chains are Microsoft's implementation of frame delay. In my understanding, there's a big catch, however:

(http://blogswin.blob.core.windows.net/win/sites/3/2013/12/input_5F00_latency2_5F00_4F22507D.png)

You see: Button press happens during frame #1, but action happens on frame #3. No next frame response.

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: intealls on December 21, 2017, 07:30:49 am
Yes, but my tests were done on an old Core2Duo running XP 64. We suggested back then that the faster the CPU, the smaller the host system lag would be, as later tests on more powerful hardware are confirming. Intealls' results (3.6 ms from vblank) show next frame response with the beam having crossed 4/5 of the screen. Eventually, as hardware gets faster, host system lag should be close to zero.

I should mention that I'm running the tests on an i5 4690k clocked at 4.5 GHz on Windows 7 x64.  ;D

So this setup definitely has more grunt than the C2D.

The biggest uncertainty with my setup is the controller, I don't know if it works properly at 1 kHz. I have a mouse that is polled natively at 1 kHz, I'll try this as input source this evening.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on December 21, 2017, 07:33:22 am
Yes, but my tests were done on an old Core2Duo running XP 64. We suggested back then that the faster the CPU, the smaller the host system lag would be, as later tests on more powerful hardware are confirming. Intealls' results (3.6 ms from vblank) show next frame response with the beam having crossed 4/5 of the screen. Eventually, as hardware gets faster, host system lag should be close to zero.

Good to know. So basicly (for good / new hardware) we're already close to the 1ms Input Stack, as shown in the Microsoft picture below.

While I understand and find your results with Amiga logical*, I think this statement is not true for (Groovy)MAME. In general, it doesn't matter on which rasterline an arcade game reads its input, because emulation time will be compressed to a fraction of the frame time, instead of spreading across the full duration of the frame time.

I see your point of negligible frame emulation time for most of the older games and systems in MAME.  But MAME also includes games and systems that do have noticable frame emulation time. E.g. CAVE games, Chihiro based stuff, console based systems like Playstation, Saturn, computer systems like C64, Amiga, etc. Some of them have have considerable frame emulation time in (Groovy)MAME. For those cases frame delay has to be lowered from the ideal of 9 (in some cases a lot), up to the point where it starts to matter where in the frame the game or system acquires input, right?

Maybe it would be nice to test the Amiga driver in GroovyMAME and see how it stacks up to WinUAE and real hardware?

@Intealls would you be up for it? I could supply you with an Amiga diskimage / adf that boots into Amiga shell with the button test program. Then we would have a like for like comparison between GM and real hardware.

However, there are two scenarios I can think of where the GroovyMAME model won't be able to match real hardware, by design:

1.- If a game can poll input at any point in a frame and make action happen right in that frame (e.g. make the character move on the lower half of the screen). Although we have no knowledge of any game doing this, it's certainly possible that real hardware did that and GroovyMAME definitely can't.

I think the Atari 2600 used to do this a lot, it was called "racing the beam", so I wouldn't be surprised if more games from that era used similar tricks.

2.- Respond to button presses shorter than a frame period. Oomek made some tests that show this is not possible with current implementation. A button press must cross a frame boundary to get MAME aknowledge it. This might be the reason why combos in fight games are more difficult to achieve than on real hardware. An adhoc workaround for this might be possible but testing its effectiveness requires a very specific setup.

Good point. In WinUAE it's possible with the debugger to see the exact rasterline at which a program /game acquires input. If something similar would be possible with the MAME debugger you could prove whether or not there are games that read input more than once in a frame..


Certainly, those waitable swap chains are Microsoft's implementation of frame delay. In my understanding, there's a big catch, however:

You see: Button press happens during frame #1, but action happens on frame #3. No next frame response.

You're including the touch digitizer...  :). It doesn't play a role in our case (the original example incorporates a tablet with touchscreen), leaving next frame response.

(http://blogswin.blob.core.windows.net/win/sites/3/2013/12/input_5F00_latency2_5F00_4F22507D.png)
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on December 21, 2017, 07:47:19 am
You're including the touch digitizer...  :). It doesn't play a role in our case (the original example incorporates a tablet with touchscreen), leaving next frame response.

I was being merciful :)

Button (digitizer, whatever) is pressed while frame #0 is being scanned on the screen, but action happens on frame #3.  So even if we remove the touch digitizer from scene, we're still a frame late.

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on December 21, 2017, 07:56:12 am
You're including the touch digitizer...  :). It doesn't play a role in our case (the original example incorporates a tablet with touchscreen), leaving next frame response.

I was being merciful :)

Button (digitizer, whatever) is pressed while frame #0 is being scanned on the screen, but action happens on frame #3.  So even if we remove the touch digitizer from scene, we're still a frame late.

I'll be merciful too :)  The touch digitizer is an endpoint, its like Intealls Hori stick taking 25ms to process input. As Intealls tests has shown the Hori stick registers within 1 ms, so you may leave the latency of the touch digitizer out of the equation :)
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on December 21, 2017, 08:07:45 am
Hopefully this clarifies my point:

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on December 21, 2017, 11:12:39 am
Quote
Some of them have have considerable frame emulation time in (Groovy)MAME. For those cases frame delay has to be lowered from the ideal of 9 (in some cases a lot), up to the point where it starts to matter where in the frame the game or system acquires input, right?

Sure, there are systems where frame emulation time is not negligible, and because of this they don't fit in the frame delay model (although you may find a benefit enabling it even with low values just to bypass the frame queue, but that's not frame delay's purpose). However, even for those cases the point in the frame where input is polled doesn't matter as MAME always polls input at the same point: right before emulation of the frame. The emulated system does not poll input directly through MAME, it just accesses a pre-buffered data.

Quote
I think the Atari 2600 used to do this a lot, it was called "racing the beam", so I wouldn't be surprised if more games from that era used similar tricks.

That's pretty interesting, I hadn't heard of that.

Quote
Good point. In WinUAE it's possible with the debugger to see the exact rasterline at which a program /game acquires input. If something similar would be possible with the MAME debugger you could prove whether or not there are games that read input more than once in a frame..

Yeah that would be interesting. I believe there's no reliable information about this on the web, regarding different hardware systems and how and when they poll input, whether it's game dependent, etc.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: intealls on December 21, 2017, 02:12:42 pm
Some more measurements.

The same setup as before (3.6 ms before vblank = 1.9 ms window).

Razer/Microsoft HABU gaming mouse (Raw input?, native 1 kHz polling):

Pass/tot: 5/55, 9% next-frame response.

HORI Real Arcade Pro VX-SA (Xinput, X360 variant of the one I tested previously):

Pass/tot: 40/58, 69% next-frame response.

So the best device so far is the RAP V3-SA (DirectInput).

Edit:

Tried hammering a lot of buttons on the RAP V3, and measured next frame response on the one synced to vblank.

Pass/tot: 54/66, 82% next-frame response, down from 89%.

So it should perform well under use as well.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jimmer on December 21, 2017, 06:34:20 pm
If you're interested in the interface speed of your Hori stick, you could download the tool USBView (https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/usbview) from Microsoft and look up the mentioned values for your joystick. Since you seem to experience a benefit from using the hack, my guess is that it's a low speed device.

Thanks for that DrVenom. My encoder is FULL speed, and has 4 pipes(!?).  Would I be right that the final line is telling me that the HID device is being polled at 1ms ?

Code: [Select]
[Port5]  :  USB Composite Device


Device Power State:               PowerDeviceD0

       ---===>Device Information<===---
English product name: "Arduino Leonardo"

ConnectionStatus:                 
Current Config Value:              0x01  -> Device Bus Speed: Full
Device Address:                    0x03
Open Pipes:                           4

          ===>Device Descriptor<===
bLength:                           0x12
bDescriptorType:                   0x01
bcdUSB:                          0x0200
bDeviceClass:                      0xEF  -> This is a Multi-interface Function Code Device
bDeviceSubClass:                   0x02  -> This is the Common Class Sub Class
bDeviceProtocol:                   0x01  -> This is the Interface Association Descriptor protocol
bMaxPacketSize0:                   0x40 = (64) Bytes
idVendor:                        0x2341 = Arduino, LLC
idProduct:                       0x8036
bcdDevice:                       0x0100
iManufacturer:                     0x01
     English (United States)  "Arduino LLC"
iProduct:                          0x02
     English (United States)  "Arduino Leonardo"
iSerialNumber:                     0x03
     English (United States)  "HIDPC"
bNumConfigurations:                0x01

          ---===>Open Pipes<===---

          ===>Endpoint Descriptor<===
bLength:                           0x07
bDescriptorType:                   0x05
bEndpointAddress:                  0x81  -> Direction: IN - EndpointID: 1
bmAttributes:                      0x03  -> Interrupt Transfer Type
wMaxPacketSize:                  0x0010 = 0x10 bytes
bInterval:                         0x40

          ===>Endpoint Descriptor<===
bLength:                           0x07
bDescriptorType:                   0x05
bEndpointAddress:                  0x02  -> Direction: OUT - EndpointID: 2
bmAttributes:                      0x02  -> Bulk Transfer Type
wMaxPacketSize:                  0x0040 = 0x40 bytes
bInterval:                         0x00

          ===>Endpoint Descriptor<===
bLength:                           0x07
bDescriptorType:                   0x05
bEndpointAddress:                  0x83  -> Direction: IN - EndpointID: 3
bmAttributes:                      0x02  -> Bulk Transfer Type
wMaxPacketSize:                  0x0040 = 0x40 bytes
bInterval:                         0x00

          ===>Endpoint Descriptor<===
bLength:                           0x07
bDescriptorType:                   0x05
bEndpointAddress:                  0x84  -> Direction: IN - EndpointID: 4
bmAttributes:                      0x03  -> Interrupt Transfer Type
wMaxPacketSize:                  0x0040 = 0x40 bytes
bInterval:                         0x01

       ---===>Full Configuration Descriptor<===---

          ===>Configuration Descriptor<===
bLength:                           0x09
bDescriptorType:                   0x02
wTotalLength:                    0x0064  -> Validated
bNumInterfaces:                    0x03
bConfigurationValue:               0x01
iConfiguration:                    0x00
bmAttributes:                      0xA0  -> Bus Powered
  -> Remote Wakeup
MaxPower:                          0xFA = 500 mA

          ===>IAD Descriptor<===
bLength:                           0x08
bDescriptorType:                   0x0B
bFirstInterface:                   0x00
bInterfaceCount:                   0x02
bFunctionClass:                    0x02  -> This is Communications (CDC Control) USB Device Interface Class
bFunctionSubClass:                 0x02
bFunctionProtocol:                 0x01
iFunction:                         0x00

          ===>Interface Descriptor<===
bLength:                           0x09
bDescriptorType:                   0x04
bInterfaceNumber:                  0x00
bAlternateSetting:                 0x00
bNumEndpoints:                     0x01
bInterfaceClass:                   0x02  -> This is Communications (CDC Control) USB Device Interface Class
bInterfaceSubClass:                0x02
bInterfaceProtocol:                0x00
iInterface:                        0x00
  -> This is a Communications (CDC Control) USB Device Interface Class

          ===>Descriptor Hex Dump<===
bLength:                           0x05
bDescriptorType:                   0x24
05 24 00 10 01
  -> This is a Communications (CDC Control) USB Device Interface Class

          ===>Descriptor Hex Dump<===
bLength:                           0x05
bDescriptorType:                   0x24
05 24 01 01 01
  -> This is a Communications (CDC Control) USB Device Interface Class

          ===>Descriptor Hex Dump<===
bLength:                           0x04
bDescriptorType:                   0x24
04 24 02 06
  -> This is a Communications (CDC Control) USB Device Interface Class

          ===>Descriptor Hex Dump<===
bLength:                           0x05
bDescriptorType:                   0x24
05 24 06 00 01

          ===>Endpoint Descriptor<===
bLength:                           0x07
bDescriptorType:                   0x05
bEndpointAddress:                  0x81  -> Direction: IN - EndpointID: 1
bmAttributes:                      0x03  -> Interrupt Transfer Type
wMaxPacketSize:                  0x0010 = 0x10 bytes
bInterval:                         0x40

          ===>Interface Descriptor<===
bLength:                           0x09
bDescriptorType:                   0x04
bInterfaceNumber:                  0x01
bAlternateSetting:                 0x00
bNumEndpoints:                     0x02
bInterfaceClass:                   0x0A  -> This is a CDC Data USB Device Interface Class
bInterfaceSubClass:                0x00
bInterfaceProtocol:                0x00
iInterface:                        0x00

          ===>Endpoint Descriptor<===
bLength:                           0x07
bDescriptorType:                   0x05
bEndpointAddress:                  0x02  -> Direction: OUT - EndpointID: 2
bmAttributes:                      0x02  -> Bulk Transfer Type
wMaxPacketSize:                  0x0040 = 0x40 bytes
bInterval:                         0x00

          ===>Endpoint Descriptor<===
bLength:                           0x07
bDescriptorType:                   0x05
bEndpointAddress:                  0x83  -> Direction: IN - EndpointID: 3
bmAttributes:                      0x02  -> Bulk Transfer Type
wMaxPacketSize:                  0x0040 = 0x40 bytes
bInterval:                         0x00

          ===>Interface Descriptor<===
bLength:                           0x09
bDescriptorType:                   0x04
bInterfaceNumber:                  0x02
bAlternateSetting:                 0x00
bNumEndpoints:                     0x01
bInterfaceClass:                   0x03  -> HID Interface Class
bInterfaceSubClass:                0x00
bInterfaceProtocol:                0x00
iInterface:                        0x00

          ===>HID Descriptor<===
bLength:                           0x09
bDescriptorType:                   0x21
bcdHID:                          0x0101
bCountryCode:                      0x00
bNumDescriptors:                   0x01
bDescriptorType:                   0x22 (Report Descriptor)
wDescriptorLength:               0x002F

          ===>Endpoint Descriptor<===
bLength:                           0x07
bDescriptorType:                   0x05
bEndpointAddress:                  0x84  -> Direction: IN - EndpointID: 4
bmAttributes:                      0x03  -> Interrupt Transfer Type
wMaxPacketSize:                  0x0040 = 0x40 bytes
bInterval:                         0x01




Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on December 22, 2017, 08:27:46 am
Thanks for that DrVenom. My encoder is FULL speed, and has 4 pipes(!?).  Would I be right that the final line is telling me that the HID device is being polled at 1ms ?

Hi jimmer, that's correct!


Some more measurements.

Razer/Microsoft HABU gaming mouse (Raw input?, native 1 kHz polling):

Pass/tot: 5/55, 9% next-frame response.

Interesting. The deviation seems quite large when compared with the Hori stick at 1ms. Just to be sure: there's only one endpoint / open pipe on the mouse? Does it make a difference if you apply the sweetlow hack on the mouse?

Your testing setup, with the ability to force button actuation at specific points in the frame,  gave me the following thought: What if we could prove not only 1 frame response, but also sub 1 frame response? I think it may be possible, I'm curious as to your opinion on it.


Proving emulator input latency can achieve (far) below 1 frame response

Testing setup:


*On real Amiga the lowest possible visible response would be when a button actuation occurs right in front of vsync: immediately after the blanking the first yellow scanlines would be visible. If blanking is in the order of 25 lines and total screen is 313 lines, this would be in the order of ~0,1 frame visible response! Give or take a bit because of granularity of 240fps recording speed..


Things that may influence results:

1. Input stack on testing setup near zero.  Confirmed for Intealls setup by previous testing.
2. No secret buffers in the MAME Amiga driver. Confirmed. I did some tests in the past with the frame stepping method while keeping button pressed and MAME shows response in the next frame when using the button test program. We would need to reaffirm this with most recent Amiga driver in MAME.
3. There is no other delay in the host PC system. Unconfirmed. For example the almost 1 frame of delay Microsoft is hinting at with the "equilibrium" state when doing present calls**. Which is the whole point of the waitable swap chains, to do away with that last bit of lag.

** As quoted previously, Microsoft says: "In many cases, the system will reach a stable equilibrium where the game is always waiting almost a full extra frame between the time it renders and the time it presents each frame."

It may have become obvious from my previous postings, but my biggest concern is with point 3, the almost 1 frame of delay Microsoft is hinting at. The good thing is that above test could provide us with the much needed evidence whether this is really an issue for our case or not.

How cool would it be of we could prove emulator response in the subframe region?!?  I'm sure if say below 0,5 frame of response could be shown, there would be no bigger Christmas gift for Calamity this year!  :)

@Intealls, what's your opinion, would this be a valid test (setup)?

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on December 22, 2017, 09:42:07 am
Hi Dr., is that Amiga test program available somewhere? I'd like to test it if I have a chance. GroovyMAME (by design) won't be able to paint in red the bottom of the screen, however it should be able to paint yellow right after vblank like the actual Amiga.

Anyway, I need to point out that your #3 is indeed confirmed (2013). I can't elaborate on it now, but if you look at Intealls' results, they're already showing it, in other words: next frame response implies both subframe response and no hidden additional delay in the host PC. This is true because with frame delay we're sort of bypassing Microsoft's "present" stack. And if my understanding of waitable swap chains is correct (not 100% sure), what Microsoft is trying to fix is being 2 frames behind us, but their implementation is still 1 frame behind (see my picture above).

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jimmer on December 22, 2017, 10:46:45 am
Maybe one of you guys could explain the details of how the Defender system processes and displays things.  Because I'm wondering whether framedelay 9 is more responsive than the real system.

My guess has always been something simple like:
 
Poll the inputs.
cpu spend 16ms? doing calcs and moving pixels around the screen memory
Video system reads the screen memory, and then spends the next 16ms writing it to the monitor.

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on December 22, 2017, 01:21:31 pm
Hi Dr., is that Amiga test program available somewhere? I'd like to test it if I have a chance. GroovyMAME (by design) won't be able to paint in red the bottom of the screen, however it should be able to paint yellow right after vblank like the actual Amiga.

Hi Calamity,

I've created a bootable disk with the button program on it, see attachment. It boots straight to Amiga DOS. The command is "button <scanline>". Scanline value between 0 and 312 for PAL Amiga. Blanking ends around rasterline 26.

Green line appears where "wait" line is set. When mouse button is pressed, red color appears below green line, then comes vblank wait and then color gets changed to yellow. Then it waits for final vblank and resets back to normal state. (=get flickering colors if button is kept pressed). Program can be quit by pressing joystick button.   

Quote
Anyway, I need to point out that your #3 is indeed confirmed (2013). I can't elaborate on it now, but if you look at Intealls' results, they're already showing it, in other words: next frame response implies both subframe response and no hidden additional delay in the host PC. This is true because with frame delay we're sort of bypassing Microsoft's "present" stack. And if my understanding of waitable swap chains is correct (not 100% sure), what Microsoft is trying to fix is being 2 frames behind us, but their implementation is still 1 frame behind (see my picture above).

Let's see how the proof stacks up to the statement of "no hidden additional delay in the host PC".

I honestly look forward to scrubbing through a video capture of mentioned testcase and see input response happening in the subframe range of 1/4th of a frame.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on December 22, 2017, 02:08:02 pm
Quote
Let's see how the proof stacks up to the statement of "no hidden additional delay in the host PC".

Seriously Dr., I have to insist on the fact that the proof is already in this thread: http://forum.arcadecontrols.com/index.php/topic,133194.msg1377633.html#msg1377633 (http://forum.arcadecontrols.com/index.php/topic,133194.msg1377633.html#msg1377633)

Lots of others have replicated it since then. Do you or anyone else really doubt that the results above or the ones achieved by Dacasks, Intealls, etc. already prove what you're meaning to prove? I'm asking this sincerely because I have the feeling that these tests and their implications are not being understood properly and we might need to elaborate a proper write-up explaining it better. Sorry if I look assertive on this but otherwise readers of this thread might get the wrong impression that next frame response is under debate.

That said I'm willing to perform the test on the Amiga driver, but some remarks first:

- Mouse devices are not treated in the same way as joysticks or keyboards in GroovyMAME. There's an specific fix in GroovyMAME to force keyboards and joysticks to be polled on demand (remind the "always_poll" patch) instead of at a given timeout. I tried implementing the same for all input devices but that broke mouse and spinners. So to achieve the lower latency possible you'd need modify your test program to use a keyboard button instead of the mouse. Is this change possible? *

- As I said previously, the red colour below the green line won't happen in GM. Is that condition required to get yellow color in next frame? I have the feeling that you'll use that result to deny subframe response. But we're not talking about the same thing here. What you want to achieve is "current frame response" which I already said is not possible and probably won't ever be in GM. Next frame response already implies what you call "subframe response": in my example above it's 2/3 of 16.67 ms (subframe). In Intealls' tests it's much lower. Do you see my point?

* This is the probable explanation to why Intealls' mouse test results weren't good.

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: intealls on December 22, 2017, 03:20:43 pm
Please, no bickering. :)

Probably most (all?) mice, fight sticks or keyboards run a small firmware.

If the firmware only updates its state at 30 Hz, it doesn't matter if the device is polled at 1 kHz.

I think this is the problem is with the Habu mouse. I don't think the buttons are updated directly at 1 kHz (a typical debounce period could be up to 5 ms for instance, and a state update might occur after, and not before debounce).

I have an additional Teensy configured as a HID keyboard, which I have control over and it is polled at 1 kHz. I'm triggering this in the same way the others (Habu, RAPs). I'm compiling the results and will hopefully post them tomorrow, they highlight the fact that the encoder is very important for optimum frame delay performance. I haven't seen any type of USB stack latency in these measurements, granted there is no USB traffic other than the button on/off going on.  ;D
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on December 22, 2017, 05:04:42 pm
Quote
Probably most (all?) mice, fight sticks or keyboards run a small firmware.

If the firmware only updates its state at 30 Hz, it doesn't matter if the device is polled at 1 kHz.

Yeah, that makes sense, but I was talking about a change in baseline that made next frame response impossible several versions ago, which is partially reverted now in GroovyMAME, but only for keyboards and joysticks:

Code: [Select]
void windows_osd_interface::poll_input(running_machine &machine) const
{
m_keyboard_input->poll(machine);
m_mouse_input->poll_if_necessary(machine);
m_lightgun_input->poll_if_necessary(machine);
m_joystick_input->poll(machine);
}

That "if necessary" was the culprit.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: intealls on December 22, 2017, 05:22:31 pm
Yeah, that makes sense, but I was talking about a change in baseline that made next frame response impossible several versions ago, which is partially reverted now in GroovyMAME, but only for keyboards and joysticks:

Code: [Select]
void windows_osd_interface::poll_input(running_machine &machine) const
{
m_keyboard_input->poll(machine);
m_mouse_input->poll_if_necessary(machine);
m_lightgun_input->poll_if_necessary(machine);
m_joystick_input->poll(machine);
}

That "if necessary" was the culprit.

Ah, ok. Missed that, certainly makes sense.

It might not be the firmware that's the issue with the Habu then - however I still think it's a very real issue.

The nice thing about the Teensy is that it can be reconfigured easily to be a mouse, joystick or whatever. So it's possible to test several code paths/devices with one hardware.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: intealls on December 23, 2017, 01:39:25 pm
It turns out vblank was the wrong reference point for deadline calculation, SwitchRes setup a resolution of 384x240 for sf2, the actual game resolution is 384x224. I think this means that internally MAME uses a resolution of 384x240, and the frame is needed 8 scanlines before it's drawn on the screen.

This would also mean that the hard deadline for when the frame is needed is past vblank.

See the attached image.

Taking this into consideration, I setup three test cases.

All of them used the Teensy polled at 1 kHz.

Code: [Select]
2.180 ms before deadline (0.5 ms poll window)

25 pass, 25 fail = 50%.

2.430 ms before deadline (0.75 ms poll window)

39 pass, 12 fail = 76.4%.

2.680 ms before deadline (1.0 ms poll window)

50 pass. = 100 %

So the measurements certainly match up. They also imply that there is no USB lag, or other "host lags" present. Again, this is on a fast computer with virtually no USB traffic.

The videos have been uploaded to here: mega (https://mega.nz/#F!yApSRaiY!63VXX6RJJhx5KlyAGKcflQ).

With the previous measurements corrected, they look like this:

Code: [Select]
4.8 ms before deadline (3.1 ms poll window)

HORI RAP V3-SA, SweetLow 1000 Hz: 89% (82% with heavy mashing).
HORI RAP VX-SA, SweetLow 1000 Hz: 69%.
Razer/MS HABU, SweetLow 1000 Hz:  9%.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on December 23, 2017, 04:53:25 pm
I'm watching the TEENSY_2.680MS_F_DL.MOV file, it's just amazing...


From your graphic:

15.6147 kHz x 2.680 ms = 41.85 lines (42)

blanking = 1 + 3 + 16 = 20 lines
borders = 8 + 8 = 16 lines (224p active in 240p mode)

42 - 20 - 16 = 6 lines from bottom

224 - 6 = 218 (beam position at 97% of the screen height, 100% input events get in next frame)

 :o

EDIT: Ok, I now see you seem to be placing the frame deadline at the end of v back porch so I shouldn't be substracting the upper 8-line border.

42 - 20 - 8 = 14 lines from bottom

224 - 14 = 210 (94%, still amazing)
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: intealls on December 23, 2017, 05:55:03 pm
15.6147 kHz x 2.680 ms = 41.85 lines (42)

blanking = 1 + 3 + 16 = 20 lines
borders = 8 + 8 = 16 lines (224p active in 240p mode)

42 - 20 - 16 = 6 lines from bottom

224 - 6 = 218 (beam position at 97% of the screen height, 100% input events get in next frame)

 :o

Yes, this seems to be the case!

It lines up perfectly. 1/10 frametime for emulation and 1 ms for input polling.

I would say frame delay does exactly what it's supposed to.

What we need now is 8 kHz USB 3 polling and a finer frame delay granularity.  ;D

EDIT: Ok, I now see you seem to be placing the frame deadline at the end of v back porch so I shouldn't be substracting the upper 8-line border.

42 - 20 - 8 = 14 lines from bottom

224 - 14 = 210 (94%, still amazing)


Correct, I monitored Red as well to easily see where input occured.

And this is for 100% next-frame response.

I think we need a firmware that handles all of the input handling within a guaranteed 1ms timeslot, while we wait for several-kHz USB 3. :)
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on December 23, 2017, 06:26:17 pm
I think we need a firmware that handles all of the input handling within a guaranteed 1ms timeslot, while we wait for several-kHz USB 3. :)

Yeah. In the meantime a finer frame delay granularity might help. I bet your machine can do better than 1.68 ms for emulation.

Great stuff!


Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: intealls on December 23, 2017, 06:54:49 pm
I bet your machine can do better than 1.68 ms for emulation.

CPS1 seems to be a very fast driver since it runs so well at fd9, NeoGeo needs fd8. The Sega Genesis driver also needs fd8.

Haze's cv1k optimizations are amazing though - Pink Sweets runs at fd8. But it seems to vary quite a bit depending on the game. Some cv1k games need fd5.

Psyvariar (PSX based) needs fd1.

So even though it's quite fast - it's not nearly enough. :)
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jimmer on December 24, 2017, 07:07:27 am
Yeah. In the meantime a finer frame delay granularity might help. I bet your machine can do better than 1.68 ms for emulation.
Great stuff!

At some point you have to say 'who is going to notice a difference of 0.8ms ?'

To answer my Defender question, the inputs are polled every 8ms and the screen drawing is split into 2 halves (drawing one, moving pixels on the other).  This gives an average of 12ms from input to screen draw (8ms to 16ms). 

As the Defender ship is usually about 3/4 down the screen, a groovymame FD9 set-up will achieve about 15ms (12.5 + 2.7). On my LCD set-up I will have to add some more ms for the lag between the vga signal and the pixel lighting up, maybe 5ms?
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on December 24, 2017, 09:41:35 am
I've created a bootable disk with the button program on it, see attachment. It boots straight to Amiga DOS. The command is "button <scanline>". Scanline value between 0 and 312 for PAL Amiga. Blanking ends around rasterline 26.

Hi Dr.Venom,

I'm planning to test your Amiga program in the next days, I'm testing it right now on my laptop, using this command:

mame64 a1200 -flop button_test.adf

What Amiga model is this supposed to be tested on? Amiga 1200?

BTW MAME complains about Amiga driver not working, even if the program actually loads fine and seems to run as expected.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on December 24, 2017, 10:04:20 am
Quote
To answer my Defender question, the inputs are polled every 8ms and the screen drawing is split into 2 halves (drawing one, moving pixels on the other).  This gives an average of 12ms from input to screen draw (8ms to 16ms).

Just out of curiosity, where or how did you get this information about Defender?

Quote
As the Defender ship is usually about 3/4 down the screen, a groovymame FD9 set-up will achieve about 15ms (12.5 + 2.7). On my LCD set-up I will have to add some more ms for the lag between the vga signal and the pixel lighting up, maybe 5ms?

The issue here is GroovyMAME can't replicate the way Defender works, considering your description above. MAME buffers whole frames, it can't poll input twice per frame.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: 1500points on December 24, 2017, 02:54:25 pm

Just out of curiosity, where or how did you get this information about Defender?


Jarvis himself explained it on FB recently.  :)


Sent from my iPhone using Tapatalk
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jimmer on December 25, 2017, 06:21:53 am
Quote
As the Defender ship is usually about 3/4 down the screen, a groovymame FD9 set-up will achieve about 15ms (12.5 + 2.7). On my LCD set-up I will have to add some more ms for the lag between the vga signal and the pixel lighting up, maybe 5ms?

The issue here is GroovyMAME can't replicate the way Defender works, considering your description above. MAME buffers whole frames, it can't poll input twice per frame.

Indeed, I wasn't complaining in fact I was quite happy when I plugged the numbers in to find such a small difference. We have a Defender meetup in February and if I get my act together I'll be doing some testing to see what players can discern. I'll try different framedelays, and do groovymame vs jrok. I'm not a crt user though so it'll all be lcd based.

I've got an idea that there must be a way to use the highspeed of the displayport connection to squirt the whole frame AND get the lcd lit up in less than 16ms. In other words LCDs could be faster than crts for a change.





Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on December 25, 2017, 11:14:00 am
== my earlier post got deleted because of the adf's I attached. So here it is again, without the adf's ==

mame64 a1200 -flop button_test.adf

What Amiga model is this supposed to be tested on? Amiga 1200?

I would advise A500/A600 as it should be less demanding to emulate.

I used the a600 driver as on the a500 driver the keyboard doesn't seem to get recognized.

A1200 is the successor generation with faster CPU (14Mhz 68020) and more advanced video chipset. 

BTW MAME complains about Amiga driver not working, even if the program actually loads fine and seems to run as expected.

Yeah I know. A lot of things work quite OK though. I've attached some stuff for you to try out and get a feel.

Q-Bic.adf (a500/a600) -- Q-Bert clone
Hybris.adf  (a500/a600) -- Shooter, playable apart from some minor gfx and audio bugs. Use space on keyboard for superbomb and make a 360 cycle on joystick to change state of active weapon to enhanced state.
Slamtilt (a1200) -- pinball game with large 50fps scrolling playfield. It's keyboard controlled: on menu where you can pick the table just press space (gfx bug doesn't show table selection highlight) and it will load first table (Mean Machines).  When it asks for 2nd disk just load it through file manager. Game is started with F1, Enter launches ball and Alt for flippers.

Note that in Amiga land joystick port 1 has the mouse attached and port 2 has the joystick and is player 1 (so MAME "player 2" configures Amiga "player 1").  Re button_test, I just remapped button 1 on port 1 and 2  for the button_test to be used with joystick button and have program exit with mouse button.

As with more drivers (MSX to name one) the MAME dev again choose to hardcode a doubled vertical resolution for PAL progressive resolutions. It's a bit annoying given that the goal is to document real hardware. Just wish they would look more to the PSX driver to see how to handle the capability for both progressive and interlace resolutions more gracefully. Whereas LCD users probably won't notice it becomes an visible issue on CRT because GM will pick an interlaced mode based on this.

Would there be a way to accomodate for this in GM, could we just put in a "0.5" value for intscaley? (how will it handle 567/2?)
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on December 25, 2017, 12:26:29 pm
So the measurements certainly match up. They also imply that there is no USB lag, or other "host lags" present. Again, this is on a fast computer with virtually no USB traffic.

Awesome results, thanks for sharing :)  Super cool to see the evidence that it's at least possible to create a USB chain with no lags!

What if on the real arcade hardware the service menu program loop doesn't take a lot of cpu time and runs straight after vsync. Would this imply that GM with your setup (acquiring input late in the frame with no host lag) may actually prove to be lower latency than the real hardware? (Given that the real hardware uses a framebuffer and doesn't do any "racing the beam" of course!  :))

I think we need a firmware that handles all of the input handling within a guaranteed 1ms timeslot, while we wait for several-kHz USB 3. :)

Maybe the sweetlow hack for 2 - 4 - 8 khz (it's linked to from the thread you posted earlier) is already "exploiting" the possibility for USB3 microframe latency on low and full speed USB devices? http://www.overclock.net/t/1589644/usb-mouse-hard-overclocking-2000-hz (http://www.overclock.net/t/1589644/usb-mouse-hard-overclocking-2000-hz)
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on December 26, 2017, 03:51:00 am
I have an additional Teensy configured as a HID keyboard, which I have control over and it is polled at 1 kHz.

I'm sure this is already on your radar, but would the performance difference between the Teensy (keyboard) and the RAP (joystick) be fully explained by the difference in RawInput versus DirectInput API? Theoretically rawinput has the suggestion to be the faster of the two, although we never had conclusive evidence (I think).

Would be nice to see how the Teensy performs when configured as joystick instead of a keyboard (i.e. use directinput instead of rawinput).

. I haven't seen any type of USB stack latency in these measurements, granted there is no USB traffic other than the button on/off going on.  ;D

I once had the APEX M800 keyboard and while you would think of it as one single device polled at 1ms, it actually enabled 3 active endpoints (two keyboards and one mouse "device", I guess for the macro keys and the like..) all polled at 1ms.  Then you have a real mouse attached at 1ms, then an I-PAC 2 2015 version (with the joypad firmware enabled) which configures three devices (a keyboard, a mouse and a joypad) all at 4ms, and suddenly what is supposed to be a relatively simple setup, you end up with 7 attached USB devices, four of which at 1ms and three at 4ms!!  In subjective testing I could notice the difference when unplugging the APEX M800, mouse and disabling two devices on the I-PAC2, such that I ended up with only the joystick active. It made a noticable difference (subjectively) to me in fluidity of the joystick movement in fast shooters.

So you're touching on a very real issue I think.  Would be really great if you can find the time to do the Teensy test, while also having 3 or 4 other 1ms devices attached to your system.
 
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Recapnation on December 26, 2017, 08:29:11 am

As with more drivers (MSX to name one) the MAME dev again choose to hardcode a doubled vertical resolution for PAL progressive resolutions. It's a bit annoying given that the goal is to document real hardware. Just wish they would look more to the PSX driver to see how to handle the capability for both progressive and interlace resolutions more gracefully. Whereas LCD users probably won't notice it becomes an visible issue on CRT because GM will pick an interlaced mode based on this.

Would there be a way to accomodate for this in GM, could we just put in a "0.5" value for intscaley? (how will it handle 567/2?)

It was this discussed somewhere here a few months ago. Check G.2 here: http://geedorah.com/eiusdemmodi/forum/viewtopic.php?pid=987#p987 (http://geedorah.com/eiusdemmodi/forum/viewtopic.php?pid=987#p987)


Lovely findings, by the way, even if hard to follow. Thank you all.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: intealls on December 26, 2017, 10:13:27 am
I'm sure this is already on your radar, but would the performance difference between the Teensy (keyboard) and the RAP (joystick) be fully explained by the difference in RawInput versus DirectInput API? Theoretically rawinput has the suggestion to be the faster of the two, although we never had conclusive evidence (I think).

Would be nice to see how the Teensy performs when configured as joystick instead of a keyboard (i.e. use directinput instead of rawinput).

I already tried this. :)

I used this firmware (http://git.slashdev.ca/ps3-teensy-hid/tree/src), but wasn't able to get 1 ms next-frame response. I sniffed the USB traffic with a Saleae Logic and realized it was very late in sending the updated button state, it also sent a complete state update at every USB host poll. I did a very preliminary modification (stripped out almost everything and increased the poll rate) and was able to get the same 1 ms next-frame response with it configured as a joystick. I'm currently planning to do a replacement 1 ms controller for the HORI RAP that will work on the PS3 as well, using this firmware as base.

So you're touching on a very real issue I think.  Would be really great if you can find the time to do the Teensy test, while also having 3 or 4 other 1ms devices attached to your system.

I should also mention that the Teensy is connected to a USB 3 port. This might also affect performance.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: intealls on December 26, 2017, 11:04:46 am
What if on the real arcade hardware the service menu program loop doesn't take a lot of cpu time and runs straight after vsync. Would this imply that GM with your setup (acquiring input late in the frame with no host lag) may actually prove to be lower latency than the real hardware? (Given that the real hardware uses a framebuffer and doesn't do any "racing the beam" of course!  :))

It's a possibility. I have access to a Magic Sword PCB... Although it's quite a hassle to get a measurement setup with this board. I might need to buy a SuperGun for it to be realistic.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on December 28, 2017, 04:53:46 am
Would there be a way to accomodate for this in GM, could we just put in a "0.5" value for intscaley? (how will it handle 567/2?)

It was this discussed somewhere here a few months ago. Check G.2 here: http://geedorah.com/eiusdemmodi/forum/viewtopic.php?pid=987#p987 (http://geedorah.com/eiusdemmodi/forum/viewtopic.php?pid=987#p987)

Thanks, this is the way I normally use it. The problem with the MSX driver though is that it reports 467 lines and not 466, and the Amiga driver is 567 lines. Both uneven (they're derived from the interlaced mode for each system). So my worry would be that for example forcing the MSX driver on a progressive resolution of 233 (like in your guide) it may result in scaling artifacts as it's not an integer division of 467.

I used this firmware (http://git.slashdev.ca/ps3-teensy-hid/tree/src), but wasn't able to get 1 ms next-frame response. I sniffed the USB traffic with a Saleae Logic and realized it was very late in sending the updated button state, it also sent a complete state update at every USB host poll. I did a very preliminary modification (stripped out almost everything and increased the poll rate) and was able to get the same 1 ms next-frame response with it configured as a joystick.

Great, with that precision it seems safe to assume that there's no real latency difference between rawinput and directinput in MAME/GM.

Quote
I'm currently planning to do a replacement 1 ms controller for the HORI RAP that will work on the PS3 as well, using this firmware as base.

That sounds fantastic. Is it something you would be willing to share once finished? A Teensy with such firmware could possibly be a better alternative for joystick use than an I-PAC 2, since that runs at 4ms.

It's a possibility. I have access to a Magic Sword PCB... Although it's quite a hassle to get a measurement setup with this board. I might need to buy a SuperGun for it to be realistic.

Could provide a worthwhile addition though, especially in regards to our understanding. Nice game BTW  :).   I can even imagine that the real hardware always reads input at the beginning of the game loop, such that not only the test menu but also the game itself may prove to have considerable lower latency in GM (at fd9 with no host lag). 
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on December 28, 2017, 02:55:39 pm
Hi Dr., is that Amiga test program available somewhere? I'd like to test it if I have a chance. GroovyMAME (by design) won't be able to paint in red the bottom of the screen, however it should be able to paint yellow right after vblank like the actual Amiga.

Hi Guys,

I came around doing the button test with GM.

Test setup Windows 10, Intel 7700K@4.5Ghz, HD6850, I-PAC 2, Joystick with LED attached, CRT Emudriver and GM0182.

"Button test" as previously attached and MAME "a600" driver. Tested both rasterline value 260 ("button 260") and rasterline value 26 ("button 26"). GM was run with framedelay 7 (the maximum for this driver on my setup to run flawless). Results converted to PAL Amiga frames (1 Amiga frame equals 20ms, equals 4.8 camera frames), average of 10 (random) button presses, filmed at 240fps. Counting is inclusive of frame where LED goes off, until where first yellow rasterlines are seen. Results are compared to exact same test on real Amiga hardware.

Real Amiga:
Button 26: 1.7 frames
Button 260: 0.8 frames

GM 0182 with framedelay 7:

Button 26: 2.1 frames
Button 260: 2.2 frames

Difference:
Button 26: 2.1 - 1.7 = 0.4 frames added latency
Button 260: 2.2 - 0.8 = 1.4 frames of added latency

Given that these tests were run with frame delay 7, we could substract 0,2 frame from the GM results for simulating framedelay 9. Which would leave:

Difference:
Button 26: 2.1 - 0.2 - 1.7 = 0.2 frames added latency
Button 260: 2.2 - 0.2 - 0.8 = 1.2 frames of added latency

What bugs me is that even with (hypothetical) framedelay 9, the rasterline "26" case doesn't even come on par with real hardware. It should on average show lower latency than real hardware when there's zero host delay and we're using a high framedelay value that puts input polling near the end of the frame, instead of the very beginning (rasterline 26).

What bugs me even more is that the rasterline "260" case results in over 1 frame of added latency , even when using (hypothetical) framedelay 9.

Just to have this out of the way: I'm definitely not disputing whether or not GM is capable of "next frame response", it is, as Intealls tests have shown beautifully.

It's just difficult to reconcile the above results. 

Is Windows 10 adding "host delay" back in versus Windows 7? (We're running full-screen application so it shouldn't!)
Is the I-PAC 2 adding more lag than it theoretically should?
Is there a host system lag on my January 2017 high-end hardware setup?
Etc.. etc..

Maybe I'm missing something obvious. I would really appreciate it if we can dig a bit deeper and see whether we can come with an explanation for these results.

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: u-man on December 28, 2017, 04:28:58 pm
It could be the overclocked CPU. I generally would not do such tests with such really high overclocking. At Mameworld we have seen similar strange results with a overclocked CPU in benchmark tests. Overclocking in that range, can introduce fluctuating and we are already in a delicate setup, where every ms. counts.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on December 28, 2017, 07:25:16 pm
Dr.Venom, is it possible to see the code of your test program? I think I have an idea of what's going on.

Anyway, fractional frame latency values should be avoided in my opinion. It's better to count this in integer frame values intead of averaging to a fractional figure which obfuscates the meaning of the results.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on December 29, 2017, 03:22:31 am
Dr.Venom, is it possible to see the code of your test program? I think I have an idea of what's going on.

Hi Calamity. It’s not my program, it was made by Toni Wilen, the author of WinUAE.

If you look at the end of the following post, it has the button program attached including the asm source code.

http://eab.abime.net/showpost.php?p=1186341&postcount=1 (http://eab.abime.net/showpost.php?p=1186341&postcount=1)
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dacasks on December 29, 2017, 06:40:47 am
I'm one of those idiots in the back of the class, so I shyly ask...

... Wouldn't real hardware/emulation clock discrepancy affect these kind of tests?

Mame emulation by default is not perfect in many cases. From my experience, for example, as I said sometime earlier... CPS games (maybe not all of them, I don't have access) run faster by default on Mame than on real hardware (putting clock at circa 70% it kinda get close to the real deal). So maybe it would affect input rate/polling?

*rest of the class "UUUUUUUUUUUUUHHHHHHHHHHHHHHH"*
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on December 29, 2017, 07:18:35 am
Hi Calamity. It’s not my program, it was made by Toni Wilen, the author of WinUAE.

If you look at the end of the following post, it has the button program attached including the asm source code.

http://eab.abime.net/showpost.php?p=1186341&postcount=1 (http://eab.abime.net/showpost.php?p=1186341&postcount=1)

Hi Dr.Venom,

It's the first time I look at Motorola assembly but after some googling for the main opcodes I think I understand what it's doing. Just as I imagined, it is in fact a "race the beam" case, the one case that GroovyMAME can't match hardware at.

I can't elaborate it right now, but it's not a matter of GM being laggy when input is polled later in the frame, it's actually the Amiga which doesn't get input on time when input is polled early in the frame, so it matches GM more closely in that case. See what I mean? Polling input early (*in that particular race the beam case* ) approximates the experiment to a non-race the beam one.

I'll come back to this later when I have some time.

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Dr.Venom on December 29, 2017, 08:38:17 pm
Hi Calamity,

Thanks for looking into this.

I can't elaborate it right now, but it's not a matter of GM being laggy when input is polled later in the frame, it's actually the Amiga which doesn't get input on time when input is polled early in the frame

Would have been interesting to know a little bit more of your reasoning.

Not sure if I follow you because the Amiga hardware results seem to fit the expected values for the testcases on real hardware quite well.

Button 26 - expectation vs realization
Theoretical best case: button activated at line 25 (LED is filmed to go off), line 26 test: yes button pressed, program waits for vsync at line 310, turn screen yellow, first visible yellow line on camera comes then after blanking at rasterline 26. Lowest possible latency on real Amiga thus 313 lines or ~ 1 frame of latency

Worst case: button activated at line 27  (LED is filmed to go off), line 26 test is just missed, 1 frame will elaps before test at line 26 registers button press. So worst case is "best case + 1 frame" = 2 frames latency.

Latency results for a test with random button presses will be a randomization between best and worst case with an expected value of the average between the two, i.e. (1+2)/2 = 1.5 frames of latency. Any random button test should approach that value if the test is working correctly. Results with real Amiga was 1.7 frames for 10 button presses. Close enough an approximation with 10 button presses I think.

Button 260
Same reasoning for button 260 test on real Amiga:
Theoretical best case is button activated at line 259 (LED on camera off), registered at 260, wait to 310, turn yellow, first visible yellow lines on camera at line 26 in next frame. Lowest possible latency on real hardware for button 260 test is (52+26)/312 = 0.25 frame

Worst case is button activated at line 261 (LED on camera goes off), line 260 check just missed, 1 frame will elaps before 260 line test will register button press, etc. So worst case is "best case + 1 frame" = 1.25 frames.

Expected value of randomized button presses is thus randomization between best and worst or (0.25 + 1.25)/2 = 0.75 frames of latency. The measured result was 0.8 frames of latency for button 260 test on real hardware. Again close enough an approximation.

Based on this I expect the button test to be valid.

I'll come back to this later when I have some time.

That would be nice.

In any case, I've been doing an additional testcase without the button test. I did a game test that I also did previously for WinUAE. It's the Turrican II main character "jump" test. Real hardware versus GM with framedelay 7. Test is simply on startup of the game the main character jumping. Average of 10 button presses.

Turrican 2 "jump" test:
Real Amiga: 2.2 frames of latency, rounded 2 frames
GM fd7: 3.3 frames of latency, rounded 3 frames

So also in this test GM is lagging one frame versus real hardware. If I speak for myself, the pattern seems obvious. Button 26 is one frame short of where it should theoretically be. Button 260 is lagging by a frame, Turrican 2 is lagging by a frame.

To be honest I think I'm getting a bit tired with the topic. It happens I guess. So I'll leave it until someone else comes around to show something different for a change: hardware like response with Amiga emulation.

In any case, I guess now's a good time to return to some good old Arcade gaming with GM!

 
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: intealls on December 29, 2017, 11:00:03 pm
Turrican 2 "jump" test:
Real Amiga: 2.2 frames of latency, rounded 2 frames
GM fd7: 3.3 frames of latency, rounded 3 frames

So also in this test GM is lagging one frame versus real hardware. If I speak for myself, the pattern seems obvious. Button 26 is one frame short of where it should theoretically be. Button 260 is lagging by a frame, Turrican 2 is lagging by a frame.

I tried to pause, press up, press shift+p and count the frames until the jump.

This is what holding up and pressing shift+p told me.

Code: [Select]
frame 0            1          2
      ^ nothing    ^ nothing  ^ jump

This is what the recording tells me:

Code: [Select]

frame -1                0         1         2
       ^ register input ^ nothing ^ nothing ^ jump

With the MAME model that Calamity's explained, I think this is as good as it gets?
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on December 30, 2017, 07:46:45 am
Hi Dr.Venom,

Sorry that I couldn't write about this before, but I'm working full time these days, have family commitments, and it takes too long to clarify my points one by one. Since you're in this stuff since the beginning sometimes I take the agreement on some concepts for granted while it may not be the case. Regardless, it would be nice if you had uploaded your video material somewhere for us to check, it makes it easier for us to see how you deduce your values rather than just posting your bare results (maybe you'd actually posted those and I just missed it, sorry).

Now, let's go straight to your results.

Impugnation to your latency computation method

If you have one chicken and I have no chicken, should we say we have 0.5 chicken each?

We will never reach an agreement if we don't use the same logic to measure latency. Your method is valid in the context of real hardware, but when applied to frame-based emulators it easily leads to fallacious conclusions.

For all the tests done in this thread, we've always counted full frames. So, we count how many frames it takes from led lit to action, counting a full frame for each vsync boundary, instead of real time translated to frames. Possiblities are:

0 -> action happens within current frame (red color in your Amiga sample). race-the-beam case, IMPOSSIBLE in MAME.
1 -> action happens in next frame (next frame response).
2 -> action happens in frame after next.
3 -> etc.

Let's see how this applies to your tests:

Button 26

Best case ->1 frame of latency (agree)
Worst case ->2 frames of latency (agree)
(1+2)/2 = 1.5 frames of latency (disagree)

Best case probability: 26/312 -> 8.33% (race-the-beam-like case*)
Worst case probability: (310-26)/312 -> 91.67% (non-race-the-beam-like case**)

* Above I defined the "race-the-beam" case when action happens in same frame (0 frames of latency). Now I'm writting "race-the-beam-like" case for something that takes 1 frame of latency. I'm not cheating. In your particular experiment, the yellow color happens if and only if the red color happens in the previous frame. Thus the yellow color depends on a previous race-the-beam case, it's a somewhat deferred race-the-beam case.

** this is why I said that "button 26" approximates to a non-race-the-beam experiment, because worst case is much more probable than best case. Now you'll disagree saying that theoretical probability is fifty fifty, which is backed by your tests, and this is the exact point I wanted to make: you are measuring lit led to action from whatever position it happens, while I'm stating that best case should only be considered valid if you can prove that the led is lit within the brief interval between vsync and line 26. That makes it much more unlikely than worst case.

Button 260

Best case-> 0.25 frames of latency (disagree) -> 1 vsync in the middle -> 1 frame
Worst case-> 1.25 frames of latency (disagree) ->  2 vsyncs in the middle -> 2 frames
(0.25 + 1.25)/2 = 0.75 frames of latency (disagree)

Best case probability: 260/312 -> 83.33% (race-the-beam-like case)
Worst case probability: (312-260)/312 -> 16.67% (non-race-the-beam-like case*)

The highest probability of best case now makes "button 260" experiment more close to a race-the-beam case, the one MAME can't replicate.

Impugnation to the "button" test program

While this test code is perfectly valid in my opinion it's not clear it's a valid generalization of how games work. I mean, definitely some games may work like that, but not necessarily all of them.

The main point to understand here is, this test code logic spreads across the whole 20 ms of a frame, and can poll input at any time during that time.

Now, because MAME emulation slices time in frames, it's impossible to replicate that behaviour exactly. Those 20 ms will now be compressed to barely 1 ms. It's irrelevant at which point the emulated program tries to poll input, because it can't communicate with the outer world in real time. MAME will poll input right before launching the emulation of next frame. The emulated program can poll input at line 26, 260, or whatever, but all it will do is read a frozen photograph of how inputs were at the time MAME polled them, which will more or less be at line 0 if you're using a high frame delay value.

So, take for instance the "button 260" test. Even if input happens within the first 260 lines, which is very likely to happen, MAME won't set the red color in that frame, ever. On the other hand, the Amiga will. This is expected. MAME can't replicate that with current desing.

However, MAME should be perfectly capable of registering input in next frame (yellow). But it will NOT. It will not because the specific test program we're using is sequential. It won't turn yellow if it has not turned red first. Both events are not independent.

My point is that a typical game loop wouldn't work that way. Of course it can do, and maybe Turrican 2 or all Amiga games do, but that'll be a surprise for me.

So, to put this in context, the "button" program, in pseudocode, looks like this:

Code: [Select]
button:
if right_button_pressed end

wait_line(target)
set_color(green)
wait_not_line(target)

set_color(black)

if not left_button_pressed goto button

set_color(red)

wait_line(0)

set_color(yelow)

wait_line(10)
wait_line(0)

goto button


What I mean is that if it was rewritten this way, you'd probably see GroovyMAME getting closer to actual hardware than with your current tests.

Code: [Select]
button:
if right_button_pressed end

if left_button_pressed set_color(yellow) else set_color(black)

wait_line(target)
set_color(green)
wait_not_line(target)

if left_button_pressed set_color(red)

wait_line(0)

goto button


Of course I might be completely wrong. But even so, chances are the issue would be more related to the way Amiga emulation is done than any mysterious hidden latency source.

EDIT: I've noticed that the "best/worst" case naming in my explanation is misleading. It made sense in your own explanation because best/worse case only referred to a single line, being the rest of cases somewhere in between. In my explanation however, I divide all cases in two groups with no gradation, so I should change "best/worse" by "before/after" or "success/fail" to be more accurate.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on January 22, 2018, 06:53:38 pm
Retroarch's next frame response breakthrough (January 2018):

https://www.patreon.com/posts/next-frame-time-16390231 (https://www.patreon.com/posts/next-frame-time-16390231)

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jimmer on January 22, 2018, 08:23:06 pm
Retroarch's next frame response breakthrough (January 2018):

https://www.patreon.com/posts/next-frame-time-16390231 (https://www.patreon.com/posts/next-frame-time-16390231)

Cool. 

Is framedelay in ms now?  (linked post  says framedelay 15)
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Trnzaddict on January 22, 2018, 09:00:18 pm
NVM.

After watching the video answered my own question.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: buttersoft on January 22, 2018, 09:04:55 pm
Retroarch's next frame response breakthrough (January 2018):

https://www.patreon.com/posts/next-frame-time-16390231 (https://www.patreon.com/posts/next-frame-time-16390231)

lol, welcome to GM 2014. I love how they're broadcasting those "accepted myths" as never having been solved before.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: snappleman on January 27, 2018, 02:23:08 pm
I don't know, even with optimal settings I've never been able to get GM to feel "right" against hardware, but with RA now I barely feel any difference. The only thing that's lacking from RA is switchres.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on March 14, 2018, 02:14:01 pm
Ves, Doozer and I have done new latency measurements on current Linux (e.g. GroovyArcade 2018) and we can all confirm that latency-wise Linux is finally on par with Windows. This means you can achieve next frame response routinely as long as you use frame delay on a semidecent machine.

For instance, this video was recorded by Ves on a Core2Duo using -fd 7, latest GroovyArcade and a "minibox" as input device (keyboard encoder through ps-2), where you can see next frame response for some of the samples (depending on how late the input happens in previous frame): https://drive.google.com/open?id=1J3BJgFhvOdlKWU51PBqExSX4oxcLckXK

This was certainly NOT the case a few years ago when some of us performed the same tests (check previous pages in this thread). So there must have been some change in the kernel in the meantime making next frame response possible.

I think it's important to post about this so we're all updated to current knowledge of things.


Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Franko on March 17, 2018, 01:06:50 pm
That's great news!

Why was frame_delay set to 7 instead of 9 though? Was it because of the CPU limitations?

Also, what about SDL page flipping that is supposed to add 2-3 frames of lag?
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Paradroid on March 18, 2018, 02:54:13 am
we can all confirm that latency-wise Linux is finally on par with Windows.

Excellent news!


Sent from my SM-G955F using Tapatalk

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Doozer on March 18, 2018, 03:11:31 am
That's great news!

Why was frame_delay set to 7 instead of 9 though? Was it because of the CPU limitations?

Also, what about SDL page flipping that is supposed to add 2-3 frames of lag?

Indeed, frame_delay was set to 7 because of cpu limitation.

SDL page flipping can be achieved at each frame. It depends on how sdl_flip is called.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: u-man on March 18, 2018, 07:45:04 am
I saw something on MAME github: https://github.com/mamedev/mame/issues/3344#issuecomment-373963782

That sounds really nice and promising and Calamity has already adapted some of this genius idea: https://forums.blurbusters.com/viewtopic.php?f=10&p=31750#p31750

I cant understand, why mamedevs are against such ideas and why they welcome new people in such a harsh way. Dont they see that? There are trillion other ways, to speak out concerns about someones ideas.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: buttersoft on March 18, 2018, 06:32:21 pm
I saw something on MAME github: https://github.com/mamedev/mame/issues/3344#issuecomment-373963782

That sounds really nice and promising and Calamity has already adapted some of this genius idea: https://forums.blurbusters.com/viewtopic.php?f=10&p=31750#p31750

I cant understand, why mamedevs are against such ideas and why they welcome new people in such a harsh way. Don't they see that? There are trillion other ways, to speak out concerns about someones ideas.

Well, the MAME devs have their own priorities, and a workload no one could call small; put it down to that. I did find it interesting that they're planning to rasterise 3D games through the GPU. I wonder if that will speed up 3D.

EDIT: OTT
I've enabled keyboardprovider dinput in mame.ini, as i want to use an autohotkey script to feed some joytokey commmands to mame. Most the joystick commands (directions, buttons) are wired directly to mame (so... rawinput? They worked before i made the dinput change in mame.ini because dinput did not work) only the credit buttons go through autohotkey and dinput. Have i added lag of any sort by enabling keyboardprovider dinput in mame.ini?
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on March 23, 2018, 01:57:39 pm
Current frame response is finally possible!

https://youtu.be/egIEL7158N4

Demonstration of the experimental "frame slice" feature. Tear free rendering at 500 fps, 728x567i 49.890 Hz. Emulation of each frame is divided in 10 "slices", synchronized with the physical raster. Input data is polled and processed for each slice. On pressing F11, slices are shown with a color filter, revealing the (low) existing jitter.

Setup:
- Intel i7-4771 3.5 GHz, AMD Radeon R9 270, Windows 8.1 64 bits
- GroovyMAME 0.195 - Direct3D9ex - custom "frame slice" build
- JPAC wired to a microswitch and a 5V LED.

Testing Toni Wilen's "Button test" program for Amiga, emulated by GroovyMAME. This program polls input at a scanline specified by the user (green line). If input is detected, it colors all lines below the green line in red. After that, on the next frame, it colors all lines until the polling line in yellow.

See how, in many instances, the program reacts to input (LED) right in the same frame.

The second part of the video was recorded at 240 fps. The brightness was raised a bit in GroovyMAME to make the raster visible on the black background.

Thanks to Dr.Venom for inspiring this work.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Franko on March 24, 2018, 05:05:40 am
Wow, just wow. You guys are awesome.

Do you plan on including the frame slice feature in 0.196?

Also, how much impact will it have for Linux builds? Or simply any builds relying on frame_delay instead of Direct3D9ex?
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on March 24, 2018, 08:05:50 am
Do you plan on including the frame slice feature in 0.196?

By now, I'm planning to keep it in a separate, experimental build. I've some concerns about this method. Not all drivers in MAME support this method nicely. The driver needs to support partial updates of its screens, otherwise graphical glitches happen. Moreover, I'd say the great majority of drivers won't show any difference latency-wise by using this method compared to frame delay. It's for the drivers that natively made beam chasing where this method makes a real difference (e.g. Amiga). Regardless, this is surely the most important breakthrough since frame delay.

Quote
Also, how much impact will it have for Linux builds? Or simply any builds relying on frame_delay instead of Direct3D9ex?

Mark, the guy from BlurBusters, is developing methods to do this without direct polling of scanlines (which is hard to do cleanly in Linux). This means the method should be cross-platform, eventually.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Paradroid on March 26, 2018, 04:13:26 am
Current frame response is finally possible!

This is insane! Amazing development!!!

Sent from my SM-G955F using Tapatalk
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Ilitirit on March 26, 2018, 12:25:03 pm
Is there a guide/tool one can use to determine the best framedelay setting?  AFAIU it seems it seems to be a matter of trial and error?
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: buttersoft on March 26, 2018, 08:14:38 pm
Is there a guide/tool one can use to determine the best framedelay setting?  AFAIU it seems it seems to be a matter of trial and error?

There is, but i can't seem to dig it up. I see if i can find something when i get home from work.

Essentially, you run mame with -nothrottle and a log (see commandline below, noting it has to be from a commandline, not a shortcut) and note how fast it's going in %. Or you can watch the screen and hit F10 and F11 to see the speed for yourself - some games have funny intros and whatnot, so this isn't a terrible idea. The slowest speed the game itself runs at is what you want to find. Then you take the framerate of that game, and work out how many ms it takes to generate each frame. 60fps is about 16.6ms. If your game can run at 800%, it will have time to generate 8 frames in that 16.6ms interval, where the game only needs 1, and that 1 is equivalent to ~2ms. However, you want a buffer, as some games get slowdown, so double that ~2ms to about 4.15ms. So, your PC needs 4.15ms to guarantee a frame gets drawn in time, which leaves us 12.45ms free time to play with. You then divide the free time 12.45 by the total time 16.6 to work out what fraction of time is free, and get 0.75 of the total time. Frame delay comes in ten discrete slices, so 0.75 rounds down to 7/10 = frame delay 7.

The calcs normally come out the same way for a given % speed as the framerate is usually around 60, so there's a quickref table. Credit to the original posters, whose names escape me at present. Note that these are very safe values, with a buffer built in as described.

Quote
0-222% -> 0
223-249% -> 1
250-285% -> 2
286-333% -> 3
334-399% -> 4
400-499% -> 5
500-666% -> 6
667-999% -> 7
1000-1999% -> 8
2000% and over -> 9

From command line, run: groovymame.exe -v romname >romname.txt

You also add "-v" such that it will output some stats at exit.
 
So as an example, if I run outrun in MAME ("mame.exe outrun -nothrottle -v", let it run for some 30 seconds then quit, then on my machine it shows that it's able to run at 792% unthrottled . So for simplicitly I'll round this to 800%, or said differently 8 times as fast as the original hardware.

Now outrun originally runs at 60hz (and something), i.e. 60 fps. Dividing 1/60 gives us 0,016667 seconds per frame, which multiplied times 1000 says each frame is taking 16.67 milliseconds. Since mame runs it 8 times faster on average, it means on average a frame in mame takes 16.67/8=2.08 milliseconds. I'm stressing the "on average", as emulation is mostly not about averages: some frames may take longer to emulate than others. As a rule of thumb you may multiply the average frame emulate time by 2, i.e. the toughest frames to emulate take twice the average. So that brings us to 2 times 2.08 = 4.16 milliseconds that we at least need in each frame left to emulate the frame and still be in time for vblank.

So how large can frame_delay then be? Each frame takes 16.67ms of which 4.16ms need to be left for emulation. So 16.67ms - 4.16ms = 12.51ms is the maximum value at which we need to start the emulation. Now, frame_delay goes in steps of 1/10th a frame (with maximum setting 9). So in this case each higher value from 0 is adding 16.67/10 = 1.67ms.  The largest value for frame_delay that may be used is thus 12.51ms/1.67ms = 7(.47). So I could use a frame_delay of 7 for outrun on my machine (a 4.6Ghz 3770K) , going any higher to 8 or even 9 , would most surely lead to some (or a lot) emulated frames not being finished in time anymore for vblank, and thus skipped frames / loss of emulation accuracy / input latency added.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: donluca on March 27, 2018, 08:58:14 am
I'm wondering if something can be implemented straight into GroovyMAME where, the first time you run a game, it does a dry run to see how fast it can go and set frame_delay automatically based on the results. It would be needed just the first time because then the value would be stored in the game's .ini file. That would save a lot of trouble.

I also thought about a script which would run the tests on every ROM and automatically populate the games .ini files with the correct frame_delay.

I'm wondering, though, if frame_delay will become obsolete once the slices thing is made to work properly on all drivers.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: buttersoft on March 27, 2018, 06:40:25 pm
I'm wondering, though, if frame_delay will become obsolete once the slices thing is made to work properly on all drivers.

I think it does make frame delay obsolete, but i also think reworking all the drivers for MAME is going to be tricky. Some sort of option to enable/disable it for different drivers, and focus on a few bang-for-buck ones like cps1, cps2 neogeo, etc. might be the first step. Atm it's all still proof of concept though.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on March 28, 2018, 03:58:17 am
I'm wondering if something can be implemented straight into GroovyMAME where, the first time you run a game, it does a dry run to see how fast it can go and set frame_delay automatically based on the results. It would be needed just the first time because then the value would be stored in the game's .ini file. That would save a lot of trouble.

I also thought about a script which would run the tests on every ROM and automatically populate the games .ini files with the correct frame_delay.

I'm wondering, though, if frame_delay will become obsolete once the slices thing is made to work properly on all drivers.

Intealls implemented that feature for his ASIO build, check this post (http://forum.arcadecontrols.com/index.php/topic,142143.msg1471818.html#msg1471818).

There, you have to launch GM manually with the -bench option, then it gives a suggested value, which you can use later. I guess this could be arranged in an script.

My plan was to add an automatic frame delay option based on scanline timings, eventually. This is a very problematic feature because it will often fail to get a stable value for many drivers and it will be difficult for the user to interpret why it's failing.

Another (complementary?) approach would be adding a slider option to the ui, so it could be easily adjusted while in game, and then it would be stored to a cfg file.

Certainly, frame slice makes frame delay obsolete. However, as buttersoft pointed, it's not clear that the changes required for the individual drivers are going to be possible or accepted. In the coming days I plan to write a post explaining the feature and possibly make a list of drivers that are ready for this.

In the meantime, the idea I'd like to make clear is that frame slice will have exactly the same latency as frame delay has for the great majority of arcade drivers. It is only for a small fraction of drivers (those that originally did beam chasing) that frame slice will provide an actual improvement.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: intealls on March 28, 2018, 06:13:19 am
Intealls implemented that feature for his ASIO build, check this post (http://forum.arcadecontrols.com/index.php/topic,142143.msg1471818.html#msg1471818).

...

Another (complementary?) approach would be adding a slider option to the ui, so it could be easily adjusted while in game, and then it would be stored to a cfg file.

My personal GM build already has these options. :)

The slider is infinitely useful. I can post the (small) diff tonight.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on March 28, 2018, 10:29:24 am

My personal GM build already has these options. :)

The slider is infinitely useful. I can post the (small) diff tonight.

That'd be great. Hopefully I can add it for the new release on time.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: donluca on March 28, 2018, 12:12:44 pm
Wait, you're telling me that you can adjust frame delay *while* the game is running and have immediate feedback?

That would be awesome!

Seriously, those are exciting times for MAME enthusiasts, thanks for all your hard work.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jimmer on March 28, 2018, 01:13:48 pm
Intealls implemented that feature for his ASIO build, check this post (http://forum.arcadecontrols.com/index.php/topic,142143.msg1471818.html#msg1471818).

...

Another (complementary?) approach would be adding a slider option to the ui, so it could be easily adjusted while in game, and then it would be stored to a cfg file.

My personal GM build already has these options. :)

The slider is infinitely useful. I can post the (small) diff tonight.

That would be great if you do. I've only framedelayed a few games so far but this would make it so easy.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: intealls on March 28, 2018, 04:03:14 pm
Here are the patches.

Calamity: Awesome work on the frame slicing! Super interesting stuff, as always.

The fdbench patch just outputs a statistic when run with -bench. Lets say you get 99.5% with fd 7, and 0.5% fd0. This probably means the game is safe to run at fd 6/7. But if you get 94.5% at fd7 and 5% on fd6, you should probably run the game at fd5/fd6. The slider makes it easy to tweak this. I haven't tested if the value is saved to config - I don't have that enabled.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jimmer on March 29, 2018, 12:52:43 pm
removed: long story about not being able to get the diff file to work, and having to change the files manually.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on March 29, 2018, 01:15:42 pm
FYI, both patches are now included in GM 0.196. I had to apply then manually too.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: intealls on March 29, 2018, 07:07:17 pm
FYI, both patches are now included in GM 0.196. I had to apply then manually too.

Great!

Thanks for adding these. I tested them before posting with 0.195, they applied without problems there. Should have tried them with 0.196.

Remember that when using the frame delay benchmark, video and sound is not being output. This means that the benchmark will only report CPU time needed for emulation. Thus you should not be surprised if the benchmark tells you that fd9 is perfectly fine to use, but you might end up having to use fd8 for the benchmarked game. Also, for some drivers, speeds vary during gameplay. So when run with -bench 90 for instance, you might only capture a fraction of real world emulated frame times. What I'm trying to say is - the benchmark is only an indication. It's not a definitive answer to what setting is safe to use. The true setting would probably be gotten if you benchmark a recording of someone finishing the game and hitting all driver code paths. With that said, if you benchmark a couple of minutes (-bench 240) the benchmark mostly seems to give a useful value.

Edit: Just remembered, if you want to include video and sound in the benchmark, use 'mame64 game -nothrottle -nowaitvsync -nosyncrefresh -notriplebuffer -video d3d' and press ESC after a while.

There's a pretty big difference when including video and sound. This is on my main rig though, the differences are probably smaller when used on a proper GM CRT Emudriver setup.

Code: [Select]
y:\>mame64 akatana -nothrottle -nowaitvsync -nosyncrefresh -notriplebuffer -video d3d
Video chipset is not compatible.
SwitchRes: could not find a video mode that meets your specs
Frame delay/percentage: 6/0.14% 7/0.07% 8/14.91% 9/84.88%
Average speed: 1380.40% (378 seconds)

y:\>mame64 akatana -bench 240
Video chipset is not compatible.
SwitchRes: could not find a video mode that meets your specs
Frame delay/percentage: 6/0.11% 7/0.22% 8/3.89% 9/95.78%
Average speed: 1633.52% (239 seconds)
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Ilitirit on March 30, 2018, 11:34:36 am
Meanwhile...

https://forums.libretro.com/t/input-lag-compensation-to-compensate-for-games-internal-lag/15075
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: intealls on March 30, 2018, 03:29:48 pm
Meanwhile...

https://forums.libretro.com/t/input-lag-compensation-to-compensate-for-games-internal-lag/15075

Unless I'm too tired to think straight, that won't help with next-frame response. It would probably even make next-frame response more difficult, since two (or more) frames need to be emulated in the same timeslot (which will lead to less time spent waiting for input). It will only affect internal lag (which is supposed to be there, if emulation accuracy is of concern). See the (crappy) sketch applied.

At least with frame delay, and not Calamity's fresh frame slice method.

Edit: I guess I'm too tired to think straight, when refraining from sleep, you should refrain from posting. It shouldn't make next frame-response more difficult, at least in theory. The sketch is a bad example and probably a stupid way to implement something like this.

Code: [Select]
1 ms slots

 ------------------------------------------------
 |  |  |  |  |  |  |  |  |  |PL|ET|ET|PL|ET|ET|VB|
 ------------------------------------------------

PL = poll
ET = emulation time
VB = VBLANK

Input will be reflected by either first or second poll and output to screen.

Edit again: Upon further consideration (and sleep) this method will eat next-frame response time. Polling the input twice per frame will probably lead to weird artifacts, since it can lead to different outcomes in the game. The only way to get around this is to poll the input once, and then emulate after the poll, which eats CPU time. Like so:

Code: [Select]
1 ms slots

 ------------------------------------------------
 |  |  |  |  |  |  |  |  |  |  |PL|E0|E0|E1|E1|VB|
 ------------------------------------------------

PL = poll
E0 = emulation time, frame 0
E1 = emulation time, frame 1
VB = VBLANK

Input needs to be polled earlier, making next-frame response more difficult.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: iVoid on April 01, 2018, 11:12:02 am
Here is a video of the feature in action, by TylerL:
https://www.youtube.com/watch?v=_qys9sdzJKI (https://www.youtube.com/watch?v=_qys9sdzJKI)
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: intealls on April 01, 2018, 12:29:41 pm
For drivers that actually have next-frame response, that will be more difficult to achieve, since the input always need to be polled earlier. I really don't think there's any way around this. For very fast drivers though, it probably won't matter much.

One interesting aspect of an approach like this would be that it might be possible to relax the CPU requirement and still achieve the same (or better) response you would with a very high frame delay setting, if the game itself does not have next-frame response. That could even make the emulation feel more like the original system than frame delay would be capable of (shaves off the last few milliseconds).
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: iVoid on April 01, 2018, 02:24:47 pm
That is correct. The author himself has stated that this will only be useful for games with internal input lag.

But for those games with internal lag (which are very common) the difference is massive since it cuts entire frames of lag, which can easily offset the gains from frame delay.

Furthermore, unlike frame delay, there is no need for trial and error to determine the correct values since it is very easy to determine how many frames of internal lag a game has by pausing, holding a button, and running the emulator frame by frame.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: intealls on April 01, 2018, 03:13:09 pm
But for those games with internal lag (which are very common) the difference is massive since it cuts entire frames of lag, which can easily offset the gains from frame delay.

For games with internal lag, in practice, it should be (almost) the same gain, with a lower processing time requirement. At least if you want to the emulation to be faithful to the original system.

SMW on my (oold) laptop feels really responsive with this. Pretty cool.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: iVoid on April 01, 2018, 04:13:59 pm
But for those games with internal lag (which are very common) the difference is massive since it cuts entire frames of lag, which can easily offset the gains from frame delay.

For games with internal lag, in practice, it should be (almost) the same gain, with a lower processing time requirement. At least if you want to the emulation to be faithful to the original system.
Yeah, if the internal lag is completely removed it is possible to get lower latency than real hardware, which isn't exactly faithful to the original. But modern hardware always adds a bit of latency which is not faithful to the original either. At least now we have a way to lower latency to the amount we see fit, and the only way to determine if the total latency is the same as real hardware is to run tests.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: intealls on April 01, 2018, 04:27:10 pm
But modern hardware always adds a bit of latency which is not faithful to the original either.

Sure, but not a properly set up GM ;)

With a (proper) 1kHz USB polled controller, a CRT and FD 9 you get a (16.67 - 1 - 1.667)/16.67 guarantee of getting the same response as actual hardware, assuming ~60Hz, input-polled once model, accurate driver, excluding any system jitter etc. With faster CPUs and higher USB poll rates this probability will grow.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: iVoid on April 01, 2018, 04:48:09 pm
But modern hardware always adds a bit of latency which is not faithful to the original either.

Sure, but not a properly set up GM ;)

With a (proper) 1kHz USB polled controller, a CRT and FD 9 you get a (16.67 - 1 - 1.667)/16.67 guarantee of getting the same response as actual hardware, assuming ~60Hz, input-polled once model, accurate driver, excluding any system jitter etc. With faster CPUs and higher USB poll rates this probability will grow.
Sure but even on those conditions the average latency will still be higher than real hardware, even if only slightly, there's no way around that. It's pretty amazing that even with modest hardware and maybe even an LCD it might be possible to get the same results on some games! Who knows, further testing will tell :)

One of my favorite arcade games ever, Samurai Shodown 3, has 1 frame of internal input lag and regrettably I don't have a CRT monitor anymore so I'm pretty excited about this :D I can only imagine how responsive the game must be with this feature on a CRT.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: b4nd1t0 on April 03, 2018, 07:53:02 am
Some questions, after having performed the bench and having received a percentage value, which one should I consider best, is my goal to aim at zero? Another question, where is the frame delay value set by slider recorded? After i quit the game, nothing, about the frame delay, is stored in .gfg file.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on April 03, 2018, 08:00:32 am
Some questions, after having performed the bench and having received a percentage value, which one should I consider best, is my goal to aim at zero? Another question, where is the frame delay value set by slider recorded? After i quit the game, nothing, about the frame delay, is stored in .gfg file.

It's not stored, I think this should be added to make the feature useful. As for the value, if you pick the lower one you'll probably be 100% safe, anyway I'd pick the higher one if the percentage is very high, at the risk of some eventual hiccup.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: intealls on April 03, 2018, 11:44:51 am
It's not stored, I think this should be added to make the feature useful. As for the value, if you pick the lower one you'll probably be 100% safe, anyway I'd pick the higher one if the percentage is very high, at the risk of some eventual hiccup.

An option to store the value without having to use the writeconfig option could be useful. Currently, for some reason, writeconfig doesn't update the value set with the slider. Personally, I don't like writeconfig, which is why I have it disabled (and thus never tested if the value was updated, which I stated in the original post).

Edit: You could use writeconfig, do the benchmark and run the game with the determined value. That seems to store it.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: donluca on April 05, 2018, 12:00:16 pm
That was the idea I had for it.

When you start the game, it checks if a framedelay value is present in the game's configuration file (or if the file is present at all).
If it isn't, it starts a brief test to determine the ideal framedelay value, stores it in the configuration file and starts the game.

Just to be sure, we could tell the script to store the value found minus 1 just to be 100% sure that it won't cause hiccups during gameplay. One can always get in the menu and adjust the value via the slider if he wants.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: krick on April 21, 2018, 11:12:22 am
Edit: Just remembered, if you want to include video and sound in the benchmark, use 'mame64 game -nothrottle -nowaitvsync -nosyncrefresh -notriplebuffer -video d3d' and press ESC after a while.

It might be be useful to create a new "benchfull" switch that includes those options.

Also, do your statistic calculations take into account that the number of seconds reported by bench (at least the output, anyway) is always off by one...

http://forums.bannister.org//ubbthreads.php?ubb=showflat&Number=111303 (http://forums.bannister.org//ubbthreads.php?ubb=showflat&Number=111303)
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: intealls on April 21, 2018, 07:06:50 pm
Also, do your statistic calculations take into account that the number of seconds reported by bench (at least the output, anyway) is always off by one...

http://forums.bannister.org//ubbthreads.php?ubb=showflat&Number=111303 (http://forums.bannister.org//ubbthreads.php?ubb=showflat&Number=111303)

The benchmark doesn't care about the number of seconds run at all. It's a 10 bin histogram of emulated frame times.

If you want a better result, you run it longer.

Edit: Correction: it's a 10 bin histogram of the average of a 1/4th of an emulated second worth of frame times. If more accuracy is needed (doubtful), this could be changed by altering ATTOSECONDS_PER_SPEED_UPDATE in video.h, which might cause issues.
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: faybiens on August 18, 2018, 08:53:45 am
Just read the entire topic and this is very interesting.

I did many simple measurements with an iphone at 240f/s and a LED connected to button
Comparing my real neogeo hardware and groovymame

Neogeo responses after 3 frames, mame after 4 frames
With shift+P mame shows the machine should respond after 2 frames

I tried space invaders that should respond after 1 frame with shift+p, that is the next frame response
when pressing a button, mame responses after 3 frames

I think my CRT adds a frame, it may have a digital process in, should try with another one
But there is still one frame that I could not get rid off

I was wondering if my JammaSD (I did the 1000hz poliing rate) could add something or the ATI radeon HD7800 ?
Is there something to do with ATI video cards, or do they add one frame lag?

 
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on August 18, 2018, 08:59:40 am
Are you using frame delay? GM version? Windows version? Linux?
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: faybiens on August 18, 2018, 09:10:01 am
Hello Calamity
GM 0.194, windows 7, frame delay 9 on space invaders, frame delay 6 for neogeo machine
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: faybiens on August 18, 2018, 09:12:19 am
I tried GM 0.197 with HLSL, but same results
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: faybiens on August 18, 2018, 10:37:18 am
I have focused on the possibility of a pre-rendered frame with my ATI Radeon.
I used RadeonPRO tweak software and could tweak a frame queue size from driver setting to 0

It solved this 1 frame I tried to get rid of.
So ATI owners have to do that

I will try another CRT, mine introduces a frame lag
Did someone else experienced that as well?
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: Calamity on August 18, 2018, 01:12:48 pm
The d3d9ex build of GM already removes the frame queue, even without frame delay. What kind of crt are you using? Does it have digital processing?
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: faybiens on August 19, 2018, 05:50:27 am
I have redone a bunch of slowmotions and shift+P, and with .ini different parameters
My CRT does not introduce any lag (I was comparing two different things previously, two different actions in the neogeo)

With shift+P Neogeo reacts at the 3rd frame in Mame
The real hardware reacts at the 3rd (puzzle bubble) for the same action
GM does it in the 4th, depends at lot on when the action happens on the frame
Variation in GM is something like 3.5 to 4.5 reaction

Space invaders reacts at the 2nd, could does something like 1.5 to 2.5
Seems that I could get action happening in the next frame, but action has to happen at the beginning of a frame

I had the feeling that the queue thing in RadeonPro could improve of a 1/4th of frame, but this seems to be more subjective than objective
But Overall average seems to indicate that there is 1 frame of lag

If GM could get one real hardware lag frame off (when it has lag), it could be just as the real hardware
But understood that this is not in Mame's philosophy

Would be happy to test the frame slice feature :-)
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: jimmer on August 19, 2018, 09:25:35 am

See my thread 'Unexplained Lag ...'  Calamity provides a download link and how to run info for the frameslice build.

Thanks to this thread I've started using the frame advance button to analyse what's happening a bit better, I'm not sure exactly what's happening when I step though a frame in 4 steps (note: frame_slice 3 = 4 slices) but I'm seeing some things that I'm sure will be an aid to eventually improving the Defender emulation. 
Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: faybiens on August 19, 2018, 04:17:53 pm
Thanks for the link, just tried it.

Title: Re: Input Lag - MAME iteration comparisons vs. the real thing?
Post by: faybiens on August 22, 2018, 02:25:47 am
When playing with .ini, I am noticing that frame delay feature is like switched off.
Even at 9, which normally would slowdown a game, it does nothing .

I need to get which parameters could influe, any ideas?