Main Restorations Software Audio/Jukebox/MP3 Everything Else Buy/Sell/Trade
Project Announcements Monitor/Video GroovyMAME Merit/JVL Touchscreen Meet Up Retail Vendors
Driving & Racing Woodworking Software Support Forums Consoles Project Arcade Reviews
Automated Projects Artwork Frontend Support Forums Pinball Forum Discussion Old Boards
Raspberry Pi & Dev Board controls.dat Linux Miscellaneous Arcade Wiki Discussion Old Archives
Lightguns Arcade1Up Try the site in https mode Site News

Unread posts | New Replies | Recent posts | Rules | Chatroom | Wiki | File Repository | RSS | Submit news

  

Author Topic: Input Lag - MAME iteration comparisons vs. the real thing?  (Read 135935 times)

0 Members and 1 Guest are viewing this topic.

cools

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 645
  • Last login:March 11, 2024, 02:59:06 pm
  • Arcade Otaku Sysadmin
    • Arcade Otaku
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #40 on: August 09, 2013, 06:51:11 am »
I don't know how to fix it, but you're definitely right that with Windows the "host system delay" varies, regardless of how fast your hardware is - I notice it.

It might be something that isn't fixable without changing to a realtime OS where delays are guaranteed.

jdubs

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 61
  • Last login:January 03, 2018, 09:06:27 am
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #41 on: August 13, 2013, 01:38:29 pm »
Hi Calamity,

What I've found is that GroovyMAME can really get the action happen on next frame, but this is only true if the input happens somewhere inside the first 1/3 of the previous frame. I'm running GM with -frame_delay 7. This means that the period of time that takes from 1/3 to 7/10 of frame (green and red lines in the picture) is the estimate lag attributable to the host system. The USB polling rate has been set to 1000 Hz, and GM is using raw input already (JPAC), so this is the bare minimum lag that seems to be possible for my particular system (Core2Duo).

Now that we've almost hit rock bottom on the possible input delay reductions, finally getting a sense of all the variables involved (many they are) to get to the lowest possible latency, I was thinking of some very last straws to latch onto, to possibly lower that average input latency of 0.65 frame even further.

Basically where we are now:
  • frame_delay feature allows us to reliably reduce traditional emulator delay, by moving the frame emulation closer to vblank. A setting of 7 seems to reliably reduce the input delay with rounded 12ms for a 60hz game, leaving about 5ms of delay.
  • "host system delay", i.e. the delay in the USB signal traveling through the Windows layers, seems to add about 6ms.

I have two observations:

Regarding 1:  On my machine, a 4.6Ghz i7 3770k, using MESS drivers that run unthrottled in the range of 2600% (26 times faster than real hardware frametime), it seems as if the frame_delay has a limit of 7, before starting to skip frames occasionaly (my personal conclusion), adding to the input latency. I find it odd that for this setup and PC hardware, frame_delay isn't able to reliably use a value of 8 or 9, or the valhalla 10 even, given how high the untrottled speed is?

I can currently think of only one reason, which is the reliability of the Windows wait() function. Apparently this normally defaults to a smallest value of 10ms wait time, regardless of whether you specify a smaller value. Only by setting specific parameters the granularity can be increased to the lowest possible, which I understand to be 1 ms. Now I did a small test some time ago, and from my findings it looked that Windows mostly provides wait() periods with granularity of up to 1ms, but every now and then will throw in a 4ms wait time. I'm definitely not sure how this works for MAME, but "random" instances where the wait() time will extend by 4ms, would most definitely be the cause for the frame_delay feature to not work to its fullest extend, because any setting - larger than 7 - will then occasionaly push a frame beyond the vblank, causing a skipped frame and adding 16+ ms to the input delay.

Hopefully other knowledgeable people can bud in, as possibly the above is one of the causes that - if improved upon - could lower the input delay by possibly as much as 4ms for many drivers when run on a fast PC.

Regarding 2: I'm wondering about the base host delay,  i.e. the delay in the USB signal traveling through the Windows layers, being 6ms and how this works.

In Calamity's update we will have the following loop:
Code: [Select]
a.display frame-> b.run the frame_delay (i.e. wait!) ->c. poll input -> d. emulate frame -> (loop to a.display frame)
Which to me raises the questions:
  • What are the chances that (part of) the "host system delay" is from the point on that the "c.poll input" is done?
  • Does "c.poll input" return OK with 100% certainty before moving to d.emulate_frame in a multi-threading setting?

If there's any possible slight delay from the "c.input poll", then (with multithreading enabled and frame emulation starting in parallel) the input event may not be available to the frame_emulation in time! Thus adding a full frame of extra input delay in these situations. Even if that only occurs occasionaly, possibly depending on speed of host PC, then that would be very detrimental to the whole purpose of the frame_delay feature.

In case the above might be true for certain situations, what could be a possible solution that would not burden the emulation in other ways? Currently "frame_delay" is simply "waiting" until point X in the frame. Can't we make that wait time more productive, i.e. why not make the frame_delay wait() period a simple loop that does:

Code: [Select]
while time left {
poll input;
}
 
That would make it extremely certain that the very last input event is available to the d.emulate_frame part, even if there would be a possible "host system delay" in a multithreading setting between c.poll_input and d_emulate frame.  Possibly this method could wipe a large part of the current "host system delay", further reducing the input_latency?

I guess I may be very wrong about this, as I have no understanding of how and where the "host system delay" in a Windows PC is adding up to the measured 6ms, and I'm also not knowledgable about how the input polling event works out in a multi-threading setting.

Hopefully there's some reason to it (but possibly not :D), and we'll be able to squeeze out some of those remaining 11ms in input latency...

Great post, Dr. Venom!  I'm looking forward to Calamity's response.

-Jim

machyavel

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 67
  • Last login:December 25, 2016, 10:23:52 am
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #42 on: August 13, 2013, 02:14:12 pm »
Hi,

Do you people think something like "fidelizer" freeware could be of any help reducing the "host lag" ??

http://www.windowsxlive.net/fidelizer/

However it's only for vista and above...

jdubs

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 61
  • Last login:January 03, 2018, 09:06:27 am
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #43 on: August 14, 2013, 02:48:13 pm »
A follow-up to this as I'm right in the midst of working through this.

I'm actually having my Sega Saturn pad "hacked" with a DB15 cable.  The cable will run into a box where I have two PCBs, one a PS360+ for pretty much all consoles and the 2nd an I-PAC USB keyboard encoder.  There will be two outputs, one an RJ45 for the PS360+ and one a USB for the I-PAC.  The I-PAC will be for MAME use, specifically using its raw_input protocol. 

The box will allow me to use a bunch of different controllers that I decide to hack with a DB15 cable.

-Jim


on a sidenote, i've found that using a gamepad instead of a real arcade joystick reduces 'human lag'

on a gamepad which uses a typical dpad, there isn't much travel/time lost between physically moving the dpad from eg. left to right.  with a real arcade joystick obviously the travel between eg. left and right is greater, and the longer the joystick shaft, the worse things get :o (no doubt that's why those sanwa short shaft/super light/high sensitivity ball-top joysticks are popular amongst streetfighter fans)

Yes the joystick hardware configuration can certainly make a difference. I'm not sure whether a good gamepad mechanically can be quicker than a -good- joystick, but alas. Personally I prefer a Gamepad for console (MESS) emulation only, but for Arcade gaming and some homecomputers I highly prefer a joystick. For the latter I'm a big fan of my Suzo Arcade joystick (with 1ms USB adapter) for many shooters as the joystick is -really- tight (mechanically) in its movement. (http://en.wikipedia.org/wiki/The_Arcade_%28joystick%29)

Unfortunately it only supports two firebuttons, so I've been looking for an alternative and purchased the X-Arcade joystick (http://www.xgaming.com/store/arcade-joysticks-and-game-controllers/product/x-arcade-solo-joystick/). But sadly (IMHO) that joystick very much suffers from the point you made, it takes quite large movements to get the microswitches to trigger :(.  There is a way too make them tighter to react (as per the manual on the x-arcade site), but even then it doesn't come close to the Suzo Arcade joystick mentioned earlier.

I'm thinking about replacing only the joystick on the X-Arcade board with the "Suzo System 500 Joystick", mentioned as the "Euro-Stik" on this page on Ultimarc.com: http://www.ultimarc.com/controls.html :

Quote
This is the Suzo System 500 stick. This is one of the most popular sticks in European arcades. It's fair to say that compared to the traditional USA sticks it takes some getting used to, but it has many fans with it's short, well defined throw. It is fully adjustable 4-8 way by rotating the plate (the screws can be left slightly loose) and even has a 2-way mode!
Mounting this stick under a wood panel takes a little more work as it has a raised ridge around the shaft which needs a recess. It's also great for mounting into the top of a panel, covered by an overlay, or on a metal panel.

This seems to be the real Arcade version joystick of the earlier mentioned Suzo Arcade for home use. Hopefully it's as tight in its movement as I hope it to be...

Quote
with a real arcade joystick maybe things get better once you master just moving your wrist instead of your entire arm (which i admit i tend to do :lol)

LOL, I remember doing that too when I got my very first home computer, not only bend over with my arm, but move with my hole body. Especially fun when you saw family members / friends doing the exact same thing, it looked really silly ;D

jdubs

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 61
  • Last login:January 03, 2018, 09:06:27 am
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #44 on: August 15, 2013, 10:41:40 pm »
Dr. Venom, if we run games at 120hz, shouldn't we then be at 1/4 frame of lag?  That's pretty sweet.

-Jim



Hi Calamity,

Fortunately, if we film a game that has an exact refresh rate of 60 Hz, the raster position is going to be "static" between diferent captures. This makes the task much easier. I've chosen Terra Cresta, because it's 60 Hz and it's known to have the minimum possible lag (action happens on next frame).

It's great that you've been doing these additional tests, they are truly valuable.

Quote
What I've found is that GroovyMAME can really get the action happen on next frame, but this is only true if the input happens somewhere inside the first 1/3 of the previous frame. I'm running GM with -frame_delay 7. This means that the period of time that takes from 1/3 to 7/10 of frame (green and red lines in the picture) is the estimate lag attributable to the host system. The USB polling rate has been set to 1000 Hz, and GM is using raw input already (JPAC), so this is the bare minimum lag that seems to be possible for my particular system (Core2Duo).

It's especially nice that we now can attach a figure to "host system lag". Basically what your test says is that the host system lag for your system, while using rawinput and 1ms clocked usb ports, is that it takes 6ms for the input to be available to the application (GM in this case). I had a quiet hope that this would be lower, but given my own tests and experience I do find a "base" host lag of 6ms to be plausible. It would be interesting to see how this compares to other systems but I guess that will be difficult to test.

So with a frame_delay of 7 we are at 11ms (average) input delay for a 60hz game. I guess the maximum possible reduction would be the ability to run at a frame_delay of 10, reducing the delay only to the host system delay or in other words 6ms. But I wonder if that will be ever feasible give the -variation- in frame emulation time and the facts that the Windows wait command may sometimes result in less than exact 1ms wait times also.

Ah well, in the end, as it is now, being able to reliably reduce average input delay to half a frame makes me a very happy gamer :) 

For discussion sake, I disagree on the rotating monitor part :D

1) In many shoot' m ups your worst enemies are at or coming from the top of the screen (gradius, r-type, etc.), wouldn't want to have that rotated to the "slow" displaying part of the screen

2) Given that human reaction time when measured from sensing (with eyes or ears) to muscle action is physiologically taking on average more than 200 (two-hundred) ms, it's impossible for a human to react to something -unexpected- happening in the first 1/3rd displayed on screen and move the joystick in that same frame.

I guess Arcade games are more about recognizing sprite patterns and anticipating to those. By anticipation and adaption a large part of the 200ms+ "reaction time" may be reduced. E.g. if you know the road with all its corners by heart you can drive faster (knowing when to turn the wheel) than someone for whom the road is totally unfamiliar.

Given this adaption mechanism "reaction time" becomes quite a complicated thing. Bottom line is that we can still differentiate between input delay  up to a granularity of single frame delays (on average at least), but for the rest... I guess that may be something for the X-Files :)

Calamity

  • Moderator
  • Trade Count: (0)
  • Full Member
  • *****
  • Offline Offline
  • Posts: 7411
  • Last login:March 14, 2024, 05:26:05 am
  • Quote me with care
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #45 on: August 17, 2013, 10:47:17 am »
Hi Dr.Venom,

Thanks a lot for your post and interest on this subject, thanks extensive to all people following this thread. Before going into more explanations, I do believe that the tests I posted about the other day do show the actual limits of input responsiveness on the specific hardware that was used (Core2Duo) and OS (XP 64). There might be a little room for improvement but I don't think that we can magically overcome the current limitations of the hardware/OS. Of course better hardware may perform better.

The good news are the -frame_delay model has been proven to be correct. By saying this, I mean that we have shown that it is perfectly possible for the average hardware to make input available right for the next frame in the emulation of common systems. Well, this is not exactly the discovery of cold fusion, but it's good to have, finally, some evidence of what we had suggested long ago: that 16.7 ms is plenty of time at the computer scale to allow for the required input processing to be done right in time to be theoretically lagless. On this regard, it is not so important if we still have some amount of sub-frame lag (host system lag), as this can be expected to be reduced steadily as hardware continues getting faster. IMHO, it's even more important to have defeated the myth that v-synced emulation, necessarily, adds at least a frame of lag, as being conceptually wrong. It is the common way of implementing v-sync what causes lag, with the extreme cases of the hidden flip queues that may be highly responsible for the black legend of v-sync.

Regarding the reliability of the wait functions, we need to clarify that MAME allows you to enable or disable a Sleep API call inside the throttling loop. This can be done through the -sleep option. For my tests, -sleep was disabled. This means that we're not asking Windows to perform a "wait", in such a way that the control might be given back to us after the requested period. When disabling -sleep, MAME just counts the ticks of the CPU clock in a tight loop until we reach the required point. So, the fact that we can't reliably apply a -frame_delay factor or 8 or 9, being the CPU perfectly capable, means that there's something else taking the control from us. I'm almost sure this is due to the OS giving the control to some other higher priority threads. In a way, disabling the -sleep option involves behaving in an uncivilized manner from the OS's point of view, and it's not strange the OS stops us for a while when it judges that other threads need their time slice. For this very reason, my tests were done with the -priority option set to 1, which is the highest possible allowed by MAME, in an attempt to reduce the chances of being stopped by the OS. However, it's not enough. So we could analize the source base to see if there's any way to increase the thread priority at all (see THREAD_PRIORITY_TIME_CRITICAL, REALTIME_PRIORITY_CLASS) or we've already reached de maximum, being aware that stealing all the CPU time from the system may leave it in a sluggy condition that might lead to a sudden hiccup periodically (not sure of this).

Finally, regarding the suggestion of a continous input poll while waiting, I think this wouldn't mean any difference, as inputs are event driven rather than polled. So think about the inputs as messages that get stored in a mailbox. It doesn't matter if you check the mailbox 100 times in a day or just once before you go to bed, the amount of messages you will pick during the day is the same.
« Last Edit: August 17, 2013, 10:53:56 am by Calamity »
Important note: posts reporting GM issues without a log will be IGNORED.
Steps to create a log:
 - From command line, run: groovymame.exe -v romname >romname.txt
 - Attach resulting romname.txt file to your post, instead of pasting it.

CRT Emudriver, VMMaker & Arcade OSD downloads, documentation and discussion:  Eiusdemmodi

Dr.Venom

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 270
  • Last login:May 08, 2018, 05:06:54 am
  • I want to build my own arcade controls!
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #46 on: August 18, 2013, 09:59:00 am »
Hi Calamity,

Thanks for your answer. It's definitely an interesting topic. Given your reply I've been delving deeper into it and may have some exciting news.

For now I'll only focus on the issue below, and get back to the others at a later stage

Regarding the reliability of the wait functions, we need to clarify that MAME allows you to enable or disable a Sleep API call inside the throttling loop. This can be done through the -sleep option. For my tests, -sleep was disabled. This means that we're not asking Windows to perform a "wait", in such a way that the control might be given back to us after the requested period. When disabling -sleep, MAME just counts the ticks of the CPU clock in a tight loop until we reach the required point. So, the fact that we can't reliably apply a -frame_delay factor or 8 or 9, being the CPU perfectly capable, means that there's something else taking the control from us. I'm almost sure this is due to the OS giving the control to some other higher priority threads.

To confirm, I've also always done my tests with -sleep 0. Given my earlier tests about the (un)reliability of the wait function, I've been looking more closely at the timer function. MAME/GM are using QueryPerformanceCounter (QPC) to count the ticks of the CPU clock in a tight loop. Although it's the highest resolution timer available and as such may seem the best, my previously reported personal tests made me believe it also to be somewhat unreliable; showing erratic spikes of 4ms in a simple 1 ms waiting loop.

My hunch that it's an unreliable timer got even more confirmed when I read the following blog:

Beware of QueryPerformanceCounter() : http://www.virtualdub.org/blog/pivot/entry.php?id=106

Based on its finding it concludes: "So, realistically, using QPC() actually exposes you to all of the existing problems of the time stamp counter AND some other bugs." and suggest to use timeGetTime() instead as a much more reliable method. Only caveat, it has a maximum resolution of 1ms, but that's high enough for our purpose. Possibly the fact that QPC has higher overhead, may be the cause for some of its issues, I'm not sure.

So next step was to actually test timeGetTime in MAME setting, and I'm somewhat excited to report that it has solved the issues with the high values for frame_delay, like 8 or 9. I can now reliably run GM with a frame_delay of 9, without issues!! This basicly means that with these high values working properly, we're getting extremely close to realtime behaviour.

Getting MAME to work with the timeGetTime timer was actually surprisingly easy. There's already a timeGetTime routine avalaible as a "backup" timer. The only thing you need to change is the following in src/osd/windows/wintime.c

//============================================================
//  GLOBAL VARIABLES
//============================================================

static osd_ticks_t ticks_per_second = 0;
static osd_ticks_t suspend_ticks = 0;
static BOOL using_qpc = TRUE;
static BOOL using_qpc = FALSE;

This will make it use the timeGetTime timer only. Luckily the code for setting this timer to its highest resolution is also in place, but I suggest you add the following bold line to src/osd/windows/winmain.c. This makes it log the resolution it is using, just so that you can verify that it's using the highest possible precision (1ms):

   // crank up the multimedia timer resolution to its max
   // this gives the system much finer timeslices
   timeresult = timeGetDevCaps(&caps, sizeof(caps));
   if (timeresult == TIMERR_NOERROR)
      timeBeginPeriod(caps.wPeriodMin);
      printf("minimum device resolution is %d millisecond(s)\r\n",caps.wPeriodMin);

Before cheering though, we need to make sure this really works for frame_delay on other setups also. So hopefully it'll be confirmed for your Core2Duo setup. If it will, I guess we may start raising the flag, getting so close to realtime now :cheers:
« Last Edit: August 18, 2013, 01:50:37 pm by Dr.Venom »

Dr.Venom

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 270
  • Last login:May 08, 2018, 05:06:54 am
  • I want to build my own arcade controls!
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #47 on: August 20, 2013, 03:42:56 pm »
Dr. Venom, if we run games at 120hz, shouldn't we then be at 1/4 frame of lag?  That's pretty sweet.

Hi Jim,

It may be a disappointment but I don't think it works that way.

What's important to understand is that the source material (i.e the emulated Arcade / console games) is at 60hz. This basically means that trickery is needed to make it run at the correct speed when the display is at 120hz.

That trickery has basicly boiled down to frame duplication or black frame insertion. Both in concept being the same, only the latter makes every other frame black, to sort of try to overcome the limititations in LCD technology or prevent some artifacts when running on a 120Hz CRT screen.

Now when it comes to input latency, there are two things to keep in mind: we have a) "real" input latency, which is the delay between an input signal given and the time it takes to register the input, render the frame and start displaying the frame, and b) display latency, which is the time between start and end of displaying the frame. Thus for the sake of this explanation, total input latency consists of real input latency (a) plus the display latency (b).

Now at 120Hz with either frame duplication or black frame insertion, in both cases by default the current frame is emulated at start of the frame. Ideally there is zero time between emulation of a frame (incorporating input changes) and start of displaying it. At 120hz there's still 8ms between start of that emulation and start of displaying that frame. So to get to the optimal situation one would still need to use GM's frame_delay feature to move the frame emulation closer to vblank. This is where there's no gain versus a 60hz display, in both cases you need frame_delay to move the frame emulation equally close to vblank. Which means the score for (a) real input latency on 120Hz and 60Hz is a tie: 1 - 1.

Then it becomes interesting, as some people claim 120hz screens display a frame (start of display to end) in about 8.5ms whereas at 60hz it would take about 17, so you would "gain" 8.5ms in the latency chain. This would seem the most logical conclusion wouldn't it? At least that's what the math says when we would be talking about the human vision as being a computer controlled camera.

"Unfortunately" it doesn't seem to work that way. The human vision is very much analog in the way it works. From what I read the human eye may be best compared to a low speed video camera, "capturing" about 25 frames color frames per second, where images have a persistence of about 1/25th of a second. That's why we're able to see the world around us as continuous. Now this view contrast quite sharply with the assumption that a human (eye) would be able to register 120 frames per second. It simply can't.

Back to the black frame insertion. This is used as a "patch" to overcome the limititations of current LCD technology and get smooth "CRT like" scrolling on a LCD. Now think about this for a minute. This method actually inserts 60 frames of black per second (out of 120), or in other words half of each second is technically pure black. So what happens is that light and dark frames are alternately registered by the human eye, where they have a persistence of 1/25th of a second. This is where the black frame insertion leads to the dimmed screen people are talking about, and they have to crank up brightness/contrast to get back to a normal level. So apparently the human eye's low speed camera is picking up on the black inserted frames. Combining this with the fact that the human eye works more like a low speed camera where images have about 1/25th of a second persistence, I cannot firmly conclude that 120hz with black frame insertion will lower the display latency. An undecided tie at best for me: 1 - 1.

So my personal conclusion would be: real input latency (meaning part "a" of the chain) when it comes to GM, thus being able to use the frame_delay feature, is the same for 60hz and 120hz screens. Display latency (part "b") isn't evidently better for 120hz, basically because the black frame insertion clearly also leads to these non-information frames being picked up by the human brain's "25fps camera" (read the dimmed screen being noticable), which may just as well lenghten latency instead of shortening it. All in all I cannot conclude a 120Hz display will lead to reduced input latency versus a 60Hz display when it comes to emulating 60hz games with GM.
« Last Edit: August 20, 2013, 03:48:09 pm by Dr.Venom »

jdubs

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 61
  • Last login:January 03, 2018, 09:06:27 am
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #48 on: August 20, 2013, 08:10:03 pm »
Dr. Venom, thank you very much for the very detailed reply / explanation!  What you laid out makes perfect sense.

Oh well....now VERY curious to hear about Calamity's experience with the timer solution that you propose!

-Jim

Dr.Venom

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 270
  • Last login:May 08, 2018, 05:06:54 am
  • I want to build my own arcade controls!
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #49 on: August 22, 2013, 01:50:28 pm »
Hmmm, I've been doing some more tests with the timing routine, to see if I could get some better facts on reliability of the QueryPerformanceCounter timer. But for some reason it's giving me pretty solid results now, whichever way I test it. I'm not sure why it was giving me different results last time. I can now also run GM with frame_delay 9 while using QPC, and it's working just as great as with the timeGetTime routine. Given this I'm not sure anymore whether the timeGetTime is more reliable than QPC, as was also suggested by the quoted blog.

I guess this is actually good news, as in theory QPC should be the more accurate timer. Possibly it would make sense to add the two timers as a configurable option to GM? Worth considering I guess, even though it's not necessary given these latest test results.

In any case, to summarize (possibly of benefit to readers just getting in), I'm glad that it's confirmed that it's possible to reliably run GM with a frame_delay setting of 9, which means near real-time behaviour when used in conjunction with:
  • a CRT screen driven by CRT Emudriver (i.e. accurate screenmode and refresh rate), with -syncrefresh, -throttle, waitvsync and -priority option set to 1 in mame/mess.ini
  • a RawInput device that works at 1ms, like e.g. a J-PAC or I-PAC
  • a USB port that is (over)clocked at 1ms
  • 'Aero' disabled when you're on Windows 7
  • and of course a PC that is powerful enough to run the MAME/MESS driver fast enough to reliably achieve a frame_delay value of 9 (or at least a value in the upper of the 1 - 9 region).
It seems that (for me personally at least) a long quest for the lowest possible input latency in MAME/MESS has come to an end...  Thanks to the superb GM and of course "Groovy" Calamity :)

Beware of QueryPerformanceCounter() : http://www.virtualdub.org/blog/pivot/entry.php?id=106

rCadeGaming

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 1256
  • Last login:December 20, 2023, 09:16:09 pm
  • Just call me Rob!
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #50 on: August 22, 2013, 05:39:16 pm »
I've been watching this thread from about the beginning.  VERY exciting stuff.  Calamity and Dr. Venom, I really appreciate all the work you are putting into this, and have been for some time.

I hadn't spoken up yet because I didn't have anything meaningful to contribute, but I think I finally thought of something:

I assume different frame_delay settings may sometimes be required for different games, with some being more demanding than others.  Is it possible to create an "auto" setting?  It could start at 9, then back off incrementally if it detects that it is regularly skipping frames.  This would save the work of carefully determining a frame_delay setting for each game.

Is this a possibility?

jdubs

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 61
  • Last login:January 03, 2018, 09:06:27 am
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #51 on: August 23, 2013, 03:56:27 pm »
Let's talk about the reliability of emulation at frame_delay of 9.  Pressing F11 gets you the operational competency of the emulation / computer you're running the emulation on (correct me if I'm wrong.)  I don't get any "Skips" but the emulation does dip down below 100% on the more challenging (SH-3-based) games very frequently.  Is this problematic from an accuracy standpoint?

This is with an i7 3770k.

-Jim 

adder

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 640
  • Last login:February 04, 2021, 10:51:51 am
  • Location: Easy St.
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #52 on: August 23, 2013, 04:07:00 pm »
i'd be interested to know opinions of the reliability of overclocking usb ports to 1ms (1000hz) .. ie. is there any risk to your hardware or performance issues etc?

below:  from Raziel's UsbRate v0.5:


Dr.Venom

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 270
  • Last login:May 08, 2018, 05:06:54 am
  • I want to build my own arcade controls!
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #53 on: August 23, 2013, 05:31:37 pm »
Hi rCadeGaming,

Thanks for the nice comments.

I assume different frame_delay settings may sometimes be required for different games, with some being more demanding than others.  Is it possible to create an "auto" setting?  It could start at 9, then back off incrementally if it detects that it is regularly skipping frames.  This would save the work of carefully determining a frame_delay setting for each game.

Is this a possibility?

You're certainly right that different games may need different frame_delay settings, it all depends on how demanding each game is.

I'll let Calamity judge whether or not he would think it would be feasible to create some sort of auto setting, but to manage some expectations, personally I don't think that will be as easy as it may sound.

Let's talk about the reliability of emulation at frame_delay of 9.  Pressing F11 gets you the operational competency of the emulation / computer you're running the emulation on (correct me if I'm wrong.)  I don't get any "Skips" but the emulation does dip down below 100% on the more challenging (SH-3-based) games very frequently.  Is this problematic from an accuracy standpoint?

This is with an i7 3770k.

Jim, yes dipping below 100% is problematic from an accuracy standpoint. The game should run at 100% at all time, no dipping allowed except for maybe shortly at startup. Otherwise something is definitely wrong.

Just one important point up front. The frame_delay in current public GM is sort of broken. Calamity found a way to improve it and has made a patch for that, which will be in the next update if I'm right. In my tests I've been using a manual patch that I applied myself to the source. I'm not sure that it'll make your example any different, but just that you know.

As replied to rCadeGaming, different games put different demands on the host hardware, resulting in one game being more demanding than others. What "more demanding" really means is that it needs longer to emulate a frame than a less demanding game does. Of course the longer it takes to emulate a frame, the lower the frame_delay value can be (otherwise you'll be pushing the frame emulation too far that it hasn't finished emulating the current frame before vblank comes), and vice versa, the less demanding the game the higher the frame_delay can be.

What's important to understand is that you know your way around testing how demanding a game is. That's actually quite simple: run the game you want to test unthrottled and it will tell you what speed it can achieve. You do this by running it once with the -nothrottle option from command shell. You also add "-v" such that it will output some stats at exit. After that it's simple math.

So as an example, if I run outrun in MAME ("mame.exe outrun -nothrottle -v", let it run for some 30 seconds then quit, then on my machine it shows that it's able to run at 792% unthrottled . So for simplicitly I'll round this to 800%, or said differently 8 times as fast as the original hardware.

Now outrun originally runs at 60hz (and something), i.e. 60 fps. Dividing 1/60 gives us 0,016667 seconds per frame, which multiplied times 1000 says each frame is taking 16.67 milliseconds. Since mame runs it 8 times faster on average, it means on average a frame in mame takes 16.67/8=2.08 milliseconds. I'm stressing the "on average", as emulation is mostly not about averages: some frames may take longer to emulate than others. As a rule of thumb you may multiply the average frame emulate time by 2, i.e. the toughest frames to emulate take twice the average. So that brings us to 2 times 2.08 = 4.16 milliseconds that we at least need in each frame left to emulate the frame and still be in time for vblank.

So how large can frame_delay then be? Each frame takes 16.67ms of which 4.16ms need to be left for emulation. So 16.67ms - 4.16ms = 12.51ms is the maximum value at which we need to start the emulation. Now, frame_delay goes in steps of 1/10th a frame (with maximum setting 9). So in this case each higher value from 0 is adding 16.67/10 = 1.67ms.  The largest value for frame_delay that may be used is thus 12.51ms/1.67ms = 7(.47). So I could use a frame_delay of 7 for outrun on my machine (a 4.6Ghz 3770K) , going any higher to 8 or even 9 , would most surely lead to some (or a lot) emulated frames not being finished in time anymore for vblank, and thus skipped frames / loss of emulation accuracy / input latency added.

Of course you can try to play a little bit with the frame_delay values, but deviating from the calculated value above is more likely to get you in trouble then not. Of course you should also always keep in mind that some drivers / games may be demanding in way that makes the time to emulate different frames go all over the place, such that the average speed it runs at won't be helping you.
 
So, as above example shows, you'll not be able to run all drivers at a frame_delay of 9. But at least you now may have an idea how to calculate what a good value can be. Of course trial and error would in the end bring you to round about the same value. Expect most to be in the range of safe values of say 5-7, with only the drivers that run -really- fast that can be set to reliably run with a frame_delay of 8 or 9. And of course not forgetting the fact that some real demanding games / drivers may not even go higher than a value of 1 or 2. In my testing I used for example a driver that runs unthrottled at 2600%, or 26 times as fast as the original. Now do the maths and you'll see that's a candidate to run at a value of 9 ;)


i'd be interested to know opinions of the reliability of overclocking usb ports to 1ms (1000hz) .. ie. is there any risk to your hardware or performance issues etc?

below:  from Raziel's UsbRate v0.5:

For as far as I know there's no real risk to your hardware. If you notice some erratic behaviour then you can always uninstall the USB rate overclock and set it back to normal. From what I read on some hardware it may show that erratic behaviour, but I personally never had any issues with it. Do note that I have no experience with the tool you're quoting, I've been using the USB overclock mentioned elsewhere in this thread.

Dr.Venom

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 270
  • Last login:May 08, 2018, 05:06:54 am
  • I want to build my own arcade controls!
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #54 on: August 25, 2013, 03:13:53 pm »
Jim, did the issue with the SH-3-based games and frame_delay make sense after all, given my explanation about frame_delay in the previous post, or do you think there may be an issue still?


Let's talk about the reliability of emulation at frame_delay of 9.  Pressing F11 gets you the operational competency of the emulation / computer you're running the emulation on (correct me if I'm wrong.)  I don't get any "Skips" but the emulation does dip down below 100% on the more challenging (SH-3-based) games very frequently.  Is this problematic from an accuracy standpoint?

This is with an i7 3770k.

-Jim

jdubs

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 61
  • Last login:January 03, 2018, 09:06:27 am
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #55 on: August 26, 2013, 01:43:48 pm »
Dr. Venom, sorry for the delay, but your explanation makes perfect sense and is inline with what I was thinking.  I need to find a "happy medium" setting for all the games I commonly run. 

Btw, I'll post some pics of the new stick I finished, soon.  Its got a separate box holding an I-PAC....turned out really nice.

-Jim

Dr.Venom

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 270
  • Last login:May 08, 2018, 05:06:54 am
  • I want to build my own arcade controls!
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #56 on: August 27, 2013, 03:34:52 pm »
Jim, good to know that it made sense. The "happy medium" is indeed good to have as a general setting. Just in case you didn't use this already, if you're really fuzzy about getting a maximum frame_delay for some specific games (that are able to run faster than the happy medium), you can also create a seperate .ini for those games. More work, but also closer to perfection  :)

Btw, I'll post some pics of the new stick I finished, soon.  Its got a separate box holding an I-PAC....turned out really nice.

Sounds great, will be nice to see what you've come with.

jdubs

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 61
  • Last login:January 03, 2018, 09:06:27 am
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #57 on: August 27, 2013, 03:57:21 pm »
Here it is, guys.  Radio Shack box with a DB-15 input and two outputs, one a Neutrik RJ45 and the other a Neutrik USB.  Inside are two PCBs.  I have an I-PAC outputting USB for use with GroovyMAME (the I-PAC registers as a keyboard so raw_input is the protocol employed) and a PS360+ for all gaming systems.  The box is basically a jack of all trades.

I had to wire a switch for the ground as the two PCBs weren't playing nice when simply wired in parallel.

The two controllers I have are a pad-hacked Sega Saturn 6-button (I didn't do the pad-hack) and a Namco PS-1 which I modded with a new JLF (used the shorter Namco shaft and also a new 3lb spring) and Sanwa 30mm buttons.  Both controllers output via a DB-15 and feed directly into the above mentioned box.  The USB output from the box goes from the I-PAC to my computer and the RJ-45 output goes to just about any gaming system you want (I use it mostly for xbox 360).

The whole setup took a while but works great.  Here are some pics:











« Last Edit: September 06, 2013, 06:14:58 pm by jdubs »

jdubs

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 61
  • Last login:January 03, 2018, 09:06:27 am
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #58 on: August 27, 2013, 11:10:40 pm »
Jim, good to know that it made sense. The "happy medium" is indeed good to have as a general setting. Just in case you didn't use this already, if you're really fuzzy about getting a maximum frame_delay for some specific games (that are able to run faster than the happy medium), you can also create a seperate .ini for those games. More work, but also closer to perfection  :)

Btw, I'll post some pics of the new stick I finished, soon.  Its got a separate box holding an I-PAC....turned out really nice.

Sounds great, will be nice to see what you've come with.

Definitely separate .ini files are the ideal!  Will be working towards that...  :)

-Jim

SMMM

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 21
  • Last login:October 10, 2020, 09:19:13 pm
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #59 on: August 30, 2013, 11:50:03 am »
Calamity found a way to improve it and has made a patch for that, which will be in the next update if I'm right.

Does anyone know when this will be?

Dr.Venom

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 270
  • Last login:May 08, 2018, 05:06:54 am
  • I want to build my own arcade controls!
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #60 on: August 31, 2013, 06:06:02 am »
Here it is, guys.  Radio Shack box with a DB-15 input and two outputs, one a Neutrik RJ45 and the other a Neutrik USB.  Inside are two PCBs.  I have an I-PAC outputting USB for use with GroovyMAME (the I-PAC registers as a keyboard so raw_input is the protocol employed) and a PS360+ for all gaming systems.  The box is basically a jack of all-spades.

Very nice :) Having the Gamepad also working with the I-PAC/RawInput API and the extension to other systems looks great.


jdubs

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 61
  • Last login:January 03, 2018, 09:06:27 am
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #61 on: September 01, 2013, 10:19:31 am »
Here it is, guys.  Radio Shack box with a DB-15 input and two outputs, one a Neutrik RJ45 and the other a Neutrik USB.  Inside are two PCBs.  I have an I-PAC outputting USB for use with GroovyMAME (the I-PAC registers as a keyboard so raw_input is the protocol employed) and a PS360+ for all gaming systems.  The box is basically a jack of all-spades.

Very nice :) Having the Gamepad also working with the I-PAC/RawInput API and the extension to other systems looks great.

Thanks man!  I was just going to use a PS360+ PCB but the lag advantage of using an I-PAC (and I actually had one laying around) drove me to wire both of them in there.  The Namco stick took forever to wire up.  Had to do a bunch of Dremeling to get the JLF to fit just right.  Turned out pretty sweet, though.

-Jim
« Last Edit: September 01, 2013, 10:22:31 am by jdubs »

Dr.Venom

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 270
  • Last login:May 08, 2018, 05:06:54 am
  • I want to build my own arcade controls!
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #62 on: September 04, 2013, 08:37:18 am »
Hi,

Long post coming up about the accuracy of the timer used in MAME for Windows, but please bear with me, as I believe this will improve the GroovyMAME timer reliability, thus benefitting throttle/frame_delay/vsync accuracy.

Following up on the earlier discussion about the topic here, I've found additional information showing that in certain situations the QueryPerformanceCounter (QPC) timer method used by MAME and GM can indeed suffer from erratic timing behaviour, thus possibly messing up the effectiveness of the frame_delay feature.

First up is the fact that the hardware implementation of the High Precision Event Timer (HPET), which is the basis for QPC, is suffering from a design defect on some chipsets. See the following page by Microsoft, listing a number of known chipsets to have this defect:

Performance counter value may unexpectedly leap forward
http://support.microsoft.com/kb/274323

Next up is the fact that QPC may in some situations deliver unreliable timing when used on AMD dual core or Intel multi core systems running XP or older:

Programs that use the QueryPerformanceCounter function may perform poorly in Windows Server 2000, in Windows Server 2003, and in Windows XP
http://support.microsoft.com/kb/895980

The issues reported on above pages are quite likely also the cause for the findings in this blog page, as posted previously:

Beware of QueryPerformanceCounter():
http://www.virtualdub.org/blog/pivot/entry.php?id=106

It is clear from these links that the QPC timer method isn't a robust timer method, and may be degrading the emulation accuracy of quite some Windows based MAME build Arcade systems. Following on from the earlier post, using TimeGetTime() as the timing method, is expected to lead to (much more) reliable timing for the frame_delay method.

Then I found new information on the High Precision Event Timer hardware in the following blog:

Using HPET for a high-resolution timer on Windows
http://blog.bfitz.us/?p=848

Because of its importance I'll quote it here:
Quote
Unfortunately, despite the promise of a new regime in 2005, it’s still not automatic; there’s work for you to do.

Even though most motherboards have the HPET timer now, it seems to be disabled by default. There’s an easy way to see if this is true or false – QueryPerformanceCounter will return a value in the 14 million range if HPET is enabled (it’s a 14 MHz timer), and something in the 3 million range if HPET is disabled (the older chip timer).

Now, this is new behavior – QueryPerformanceCounter, some years ago, returned the TSC counter, which is very high-resolution, but has huge swings with power saving modes, and as processors increased in power, power savings turns on all the time. So, Microsoft, with little fanfare, switched QueryPerformanceCounter back to using timers on motherboards. So, if you’re running an older Microsoft OS, you might get a value in the 100 million range if you call QueryPerformanceCounter, and then the following doesn’t apply to you. The bridge was somewhere in the Vista time range, but I’ve seen Vista systems that use TSC for QPC, as well as HPET/RTC for QPC.

void test_time()
{
    LARGE_INTEGER frequency;
    if (!::QueryPerformanceFrequency(&frequency))
    {
        fprintf(stderr, "failed, err=%d\n", ::GetLastError());
        exit(1);
    }
    fprintf(stdout, "freq = %lld\n", frequency.QuadPart);
}

With HPET disabled, I get freq = 3262656 as the output, or 3.26 Mhz. With HPET enabled, I get freq = 14318180 as the output, or 14.3 Mhz. This is on a Windows 7 machine with an Intel i7 975 processor and chipset. The HPET clock listed above will measure intervals with a precision of 70 nanoseconds; while this won’t help time very small sequences of instructions, this will be reasonably precise at the microsecond range.

If your BIOS has HPET enabled, then you can enable HPET in Windows with a bcdedit command, and disable it with a different bcdedit command.

Enable use of HPET

bcdedit /set useplatformclock true

Disable use of HPET

bcdedit /deletevalue useplatformclock

You’ll need to reboot to see changes, because this is a boot-time option (hence the use of bcdedit to change it).

Enabling HPET will change the performance of your system; people tend to inadvertently tune their programs to the specific behavior of a clock. It would be nice if people didn’t do that, but it happens. Anecdotal information says “makes things smoother but slower”, and this would match the idea of applications tuned to a slower clock.

As shown in the blog, it's easily tested whether the HPET is really activated by doing a query via QueryPerformanceFrequency. If it returns 14Mhz then it's enabled, if it's returning some value in the 3Mhz range then it's disabled. I'm using quite a new mainboard, an Asus P8Z77-V, running Windows 7 and guess what? The HPET is indeed disabled in Windows 7, even though I have set it to enabled in the BIOS.

After using the method reported in the blog, to enable the HPET in W7, it's now indeed correctly using the HPET's 14Mhz timer. Where earlier I still had my questions about the reliability of QPC versus the TimeGetTime method, current tests (with the HPET enabled at 14Mhz) make me think QPC is as reliable as the timeGetTime method, if not better. All tested at frame_delay setting of 9.

I've been thinking how we could use this to improve the accuracy / overall reliability of the timer function in GM for Windows. As a suggestion, we could implement three possible settings for the timer, that can be set from the mame/mess config file:

0 = auto
1 = QueryPerformanceCounter
2 = TimeGetTime

The "0" auto setting would be the default, only using QPC when the HPET returns a value in the 14Mhz range. This would be easy to check from the code by doing a query via QueryPerformanceFrequency. If it does not return a value in the 14Mhz range, then the HPET isn't really active, and GM should best default to using TimeGetTime(). The 1 and 2 setting can be used to override the automatic behaviour, if for some reason that would be needed, for example when you have one of the older chipsets with the HPET's hardware design defect.

Calamity

  • Moderator
  • Trade Count: (0)
  • Full Member
  • *****
  • Offline Offline
  • Posts: 7411
  • Last login:March 14, 2024, 05:26:05 am
  • Quote me with care
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #63 on: September 04, 2013, 10:02:44 am »
Hi Dr.Venom,

Thanks a lot for posting about this finding. It will be very easy to add this as a new option into GroovyMAME. This way we will add the timer method information to the logs so you can always know which of the two different timers are being used.

I did some testings indeed with the TimeGetTime method, and I obtained similar results to what I had obtained before with QPC, although this time I raised frame_delay to 8 for Terra Cresta before recording some videos. A value of 9 is erratic in my system (Core2Duo), but 8 is rock solid for this game. I honestly can't remember whether before I was using 7 while I could have used 8 with QPC too.

Anyway, being able to increase frame_delay from 7 to 8 must have an statistical effect in reducing input lag by capturing more input events before the host system lag barrier, although my results were similar to the ones I had previously obtained.

I've been thinking of a way to actually measure the host system lag, in the sub-frame scale. It would involve writing an specific program for it, based on raw input. A solid colour background would flip to a different colour upon a key press event, allowing a high speed camera (240 fps at least) to capture the tearing position between both colours, then based on the moment when the led lights up and the specific period of the video mode used you can calculate the host system lag with some accuracy (it would be necessary to average several samples). Users could run this program to determine their own system's lag. However I doubt many people would go through the process of wiring a led and finding a high speed camera (although these are becoming very common).

Regarding the possibility of implementing an automatic frame_delay factor, yes I guess it should be possible, although there is an obstacle at least that I can think of. In my systems, I have noticed that, very often, I can't achieve steady speed percentages with frame_delay on, even if I am possitive it's performing perfectly. Usually, I see the speed oscillating from 95 to 105%, but the scrollings are totally smooth. This means the speed measurement is wrong, probably due to some side effect of the frame_delay option. This makes it difficult the use the speed percentage as a base for deciding things. Indeed, currently the soundsync feature is disabled while frame_delay is used, which may cause sound glitches as some users have reported. This is done because as soundsync is based on the speed percentage, an erratic speed percentage value makes soundsync crazy. GroovyMAME's soundsync feature uses the speed percentage as a feedback to apply a factor to the emulation speed, causing both values to converge quite soon on a normal situation. The problem comes when the speed percentage is not reliable to begin with. Hopefully a workaround will be found to solve this problem and eventually lead to the implementation of an automatic frame_delay feature.
Important note: posts reporting GM issues without a log will be IGNORED.
Steps to create a log:
 - From command line, run: groovymame.exe -v romname >romname.txt
 - Attach resulting romname.txt file to your post, instead of pasting it.

CRT Emudriver, VMMaker & Arcade OSD downloads, documentation and discussion:  Eiusdemmodi

Dr.Venom

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 270
  • Last login:May 08, 2018, 05:06:54 am
  • I want to build my own arcade controls!
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #64 on: September 04, 2013, 04:32:22 pm »
Hi Calamity,

Thanks a lot for posting about this finding. It will be very easy to add this as a new option into GroovyMAME. This way we will add the timer method information to the logs so you can always know which of the two different timers are being used.

Great :)

Quote
Anyway, being able to increase frame_delay from 7 to 8 must have an statistical effect in reducing input lag by capturing more input events before the host system lag barrier, although my results were similar to the ones I had previously obtained.

I think the problem may be that the camera "only" has 240 fps. It basicly will limit your measurements to 4ms. I understand that statistically the difference should come through, but I wonder how many frames (with the led lighting up) have to be shot to get the statistics to works. Quite some probably...

Quote
I've been thinking of a way to actually measure the host system lag, in the sub-frame scale. It would involve writing an specific program for it, based on raw input. A solid colour background would flip to a different colour upon a key press event, allowing a high speed camera (240 fps at least) to capture the tearing position between both colours, then based on the moment when the led lights up and the specific period of the video mode used you can calculate the host system lag with some accuracy (it would be necessary to average several samples). Users could run this program to determine their own system's lag. However I doubt many people would go through the process of wiring a led and finding a high speed camera (although these are becoming very common).

Great idea. It would make it so much easier to recognize where the rasterbeam is at and get an even more accurate number on the host system delay. I hope that you'll pull this off 8). With regards to other people running their tests, you're probably right that not many will do so, but having a "simple" and accessible testing method that provides clear results will probably enhance the chance of it.

Quote
GroovyMAME's soundsync feature uses the speed percentage as a feedback to apply a factor to the emulation speed, causing both values to converge quite soon on a normal situation. The problem comes when the speed percentage is not reliable to begin with. Hopefully a workaround will be found to solve this problem and eventually lead to the implementation of an automatic frame_delay feature.

Thanks for pointing that out. Out of interest, could you give a pointer to the bits of code where GM's soundsync feature gets applied (i.e. speed percentage calculation, factor calculation and where this gets fed back into the sound emulation), and also where GM's soundsync gets set to disabled in case frame_delay is used?  That would be much appreciated, just to be able to run some tests and get a better understanding.

rCadeGaming

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 1256
  • Last login:December 20, 2023, 09:16:09 pm
  • Just call me Rob!
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #65 on: September 04, 2013, 05:33:59 pm »
Regarding the possibility of implementing an automatic frame_delay factor, yes I guess it should be possible, although there is an obstacle at least that I can think of. In my systems, I have noticed that, very often, I can't achieve steady speed percentages with frame_delay on, even if I am possitive it's performing perfectly. Usually, I see the speed oscillating from 95 to 105%, but the scrollings are totally smooth. This means the speed measurement is wrong, probably due to some side effect of the frame_delay option. This makes it difficult the use the speed percentage as a base for deciding things. Indeed, currently the soundsync feature is disabled while frame_delay is used, which may cause sound glitches as some users have reported. This is done because as soundsync is based on the speed percentage, an erratic speed percentage value makes soundsync crazy. GroovyMAME's soundsync feature uses the speed percentage as a feedback to apply a factor to the emulation speed, causing both values to converge quite soon on a normal situation. The problem comes when the speed percentage is not reliable to begin with. Hopefully a workaround will be found to solve this problem and eventually lead to the implementation of an automatic frame_delay feature.

That would be awesome.  So, if I understand correctly, solving the conflict between soundsync and frame_delay would make the auto-frame_delay setting much easier to implement?  Is there an alternative method of detecting skipped frames that could be a solution for both features? 

How about enabling autoframeskip and watching if it exceeds 0?  Or is autoframeskip also affected by the erratic speed percentage?

Users could run this program to determine their own system's lag. However I doubt many people would go through the process of wiring a led and finding a high speed camera (although these are becoming very common).

I will soon have a good setup for filming in 240fps with an LED in series with a button, and four PC's with different OS's and highly varying performance to test.  Please let me know when you're ready for any help with this, or any other lag testing with high speed video.

-

On a related note, is it possible to achieve minimal input lag without using a keyboard encoder, such as an I-PAC?  I had planned to use MC Cthulhus, which are joystick encoders, for both console and PC support in my cabinet.  I could dual-mod an I-PAC in to handle PC support, but for the sake of simplicity I would like to avoid it unless it's necessary for minimal input lag.  The MC Cthulhu has a 1ms/1000mHz firmware, so could using that with overclocked USB ports match the speed some of you are achieving with an I-PAC?  Directinput should work with joysticks, does it do so in GM, or just with keyboard encoders?
« Last Edit: September 04, 2013, 05:36:59 pm by rCadeGaming »

Dr.Venom

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 270
  • Last login:May 08, 2018, 05:06:54 am
  • I want to build my own arcade controls!
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #66 on: September 06, 2013, 09:05:13 am »
That would be awesome.  So, if I understand correctly, solving the conflict between soundsync and frame_delay would make the auto-frame_delay setting much easier to implement?

No, it won't make it much easier to implement. See it more as an obstacle that needs to be moved out of the way, before any sort of implementation for auto frame_delay can be considered.

Quote
How about enabling autoframeskip and watching if it exceeds 0?

Blasphemy. ;) *Any* method for an automatic frame_delay must not be based on degrading emulation accuracy.

I think that the solution will be in accurately measuring the time it takes for MAME to emulate frames, as a result of which you'll know how much time there's left, and use that as a base for setting a safe auto value for frame_delay. This would make it possible to enable an auto feature without the method itself being the cause for missed frames. The most challenging part will be to accurately measure frame emulate time and account for the variability in frame emulate time.

Quote
I will soon have a good setup for filming in 240fps with an LED in series with a button, and four PC's with different OS's and highly varying performance to test.  Please let me know when you're ready for any help with this, or any other lag testing with high speed video.

That's great, will be very interesting to see what result you come up with. If Calamity at one point down the road would release his latency measurement tool, I see myself also wiring up a led. Not sure when (if ever) I'll get one of those high speed camera's though.

jdubs

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 61
  • Last login:January 03, 2018, 09:06:27 am
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #67 on: September 06, 2013, 06:11:57 pm »
On a related note, is it possible to achieve minimal input lag without using a keyboard encoder, such as an I-PAC?  I had planned to use MC Cthulhus, which are joystick encoders, for both console and PC support in my cabinet.  I could dual-mod an I-PAC in to handle PC support, but for the sake of simplicity I would like to avoid it unless it's necessary for minimal input lag.  The MC Cthulhu has a 1ms/1000mHz firmware, so could using that with overclocked USB ports match the speed some of you are achieving with an I-PAC?  Directinput should work with joysticks, does it do so in GM, or just with keyboard encoders?

The raw_input api (the fastest option) is only available on keyboards in the current revision of MAME.  Hence why I built what I did, above.

-Jim


rCadeGaming

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 1256
  • Last login:December 20, 2023, 09:16:09 pm
  • Just call me Rob!
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #68 on: September 06, 2013, 11:52:09 pm »
Anyone know if that is going to change anytime in somewhat near future?  Looks like I might need an MC Cthulhu, PC Engine PCB, Genesis PCB, Dreamcast PCB, 360 PCB, and I-PAC... per player.   :-[ Well the I-PAC itself will work with both players at once, better stock up on relays.

Calamity

  • Moderator
  • Trade Count: (0)
  • Full Member
  • *****
  • Offline Offline
  • Posts: 7411
  • Last login:March 14, 2024, 05:26:05 am
  • Quote me with care
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #69 on: September 07, 2013, 08:29:36 am »
Anyone know if that is going to change anytime in somewhat near future?

It is quite possible to implement raw input in MAME for joysticks too. It just needs some work. However, I think we could suggest this to actual MAME devs.
Important note: posts reporting GM issues without a log will be IGNORED.
Steps to create a log:
 - From command line, run: groovymame.exe -v romname >romname.txt
 - Attach resulting romname.txt file to your post, instead of pasting it.

CRT Emudriver, VMMaker & Arcade OSD downloads, documentation and discussion:  Eiusdemmodi

rCadeGaming

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 1256
  • Last login:December 20, 2023, 09:16:09 pm
  • Just call me Rob!
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #70 on: September 07, 2013, 10:16:34 am »
Ok, I'll send a message through MAMEdev.com.  In the meantime, if I do try an I-PAC, does it matter if I use a PS/2 or USB connection?

Dr.Venom

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 270
  • Last login:May 08, 2018, 05:06:54 am
  • I want to build my own arcade controls!
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #71 on: September 07, 2013, 10:44:36 am »
In the meantime, if I do try an I-PAC, does it matter if I use a PS/2 or USB connection?

USB is the preferred connection. See here:

USB or PS/2 for a keyboard emulator?

http://www.ultimarc.com/usb_vs_ps2.html


rCadeGaming

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 1256
  • Last login:December 20, 2023, 09:16:09 pm
  • Just call me Rob!
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #72 on: September 07, 2013, 01:10:29 pm »
That's good, as its my preferred connection as well, haha. 

This has gotten me thinking that an I-PAC4 might actually be preferable for MAME anyhow, due to some tricks I'm thinking of using the shift button, as well as some things that can't be done in MAME using a joystick, like selecting a save state slot.

I will still send that message to MAMEdev though.

u-man

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 87
  • Last login:November 29, 2023, 05:57:09 am
  • I want to build my own arcade controls!
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #73 on: September 09, 2013, 12:28:30 pm »
I just wanted to thank Calamity and Dr. Venom for this totally interesting thread. It is somehow scientific and i like the approach how things are explained and done. You both did a awesome job here  :notworthy:

Cant wait to see the next things here, bringing MAME emulation to a new level.  :applaud:

Keep up the good work.
"Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd all be running around in darkened rooms, munching magic pills and listening to repetitive electronic music."

kujina

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 135
  • Last login:November 04, 2021, 12:07:40 pm
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #74 on: October 07, 2013, 09:41:03 pm »
As far as the USB polling frequency goes when it comes to a J-Pac or I-Pac according to Andy Warne the poll rate is what Windows applies to low speed USB devices and so this does not apply to the J-PAC because it’s a full speed USB device.

jdubs

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 61
  • Last login:January 03, 2018, 09:06:27 am
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #75 on: October 07, 2013, 11:21:21 pm »
As far as the USB polling frequency goes when it comes to a J-Pac or I-Pac according to Andy Warne the poll rate is what Windows applies to low speed USB devices and so this does not apply to the J-PAC because it’s a full speed USB device.

Link?

I see this, but its not consistent with your statement:

http://forum.arcadecontrols.com/index.php/topic,132779.msg1365395.html#msg1365395

-Jim

machyavel

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 67
  • Last login:December 25, 2016, 10:23:52 am
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #76 on: January 23, 2014, 04:41:30 pm »
(...)
That's actually quite simple: run the game you want to test unthrottled and it will tell you what speed it can achieve. You do this by running it once with the -nothrottle option from command shell. You also add "-v" such that it will output some stats at exit. After that it's simple math.
(...)

For the record (and if I got it right), the math shrinks to: 10-(2000/Avrg speed)=frame delay

Edit: actually it's 10-[(safety factor x 1000)/Avrg speed)]=frame delay, Dr.Venom choose 2 for a safety factor hence the formula above.

For example let's say a game runs unthrottled at 100% on average, with 1 as a sft. fact. it gives: frame delay=10-(1 x 1000 / 100)=0.

Now a safety of 1 means no safety at all, so let's keep 2 and make a small chart just to get a rough idea at one glance:

0-222% -> 0
223-249% -> 1
250-285% -> 2
286-333% -> 3
334-399% -> 4
400-499% -> 5
500-666% -> 6
667-999% -> 7
1000-1999% -> 8
2000% and over -> 9
« Last Edit: January 27, 2014, 03:03:12 pm by machyavel »

SMMM

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 21
  • Last login:October 10, 2020, 09:19:13 pm
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #77 on: January 25, 2014, 04:42:04 pm »
(...)
That's actually quite simple: run the game you want to test unthrottled and it will tell you what speed it can achieve. You do this by running it once with the -nothrottle option from command shell. You also add "-v" such that it will output some stats at exit. After that it's simple math.
(...)

For the record (and if I got it right), the math shrinks to: 10-(2000/Avrg speed)=frame delay

Can Dr. Venom confirm this?  I'm a little confused on his explanation on how to calculate it, so a simple formula like this would be nice. 

Monkee

  • Trade Count: (0)
  • Full Member
  • ***
  • Offline Offline
  • Posts: 166
  • Last login:March 27, 2018, 09:37:30 am
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #78 on: February 27, 2014, 09:54:28 am »
Really interesting thread, thanks guy for your tests!

One thing I'm not sure to understand though is if we should disable sleep or not at then end?  ???

Calamity

  • Moderator
  • Trade Count: (0)
  • Full Member
  • *****
  • Offline Offline
  • Posts: 7411
  • Last login:March 14, 2024, 05:26:05 am
  • Quote me with care
Re: Input Lag - MAME iteration comparisons vs. the real thing?
« Reply #79 on: March 02, 2014, 06:24:03 pm »
One thing I'm not sure to understand though is if we should disable sleep or not at then end?  ???

I'd say that disabling sleep reduces the chances for input being received late by not allowing the system to take the CPU time from us so often but I guess this highly depends on the target system.

Important note: posts reporting GM issues without a log will be IGNORED.
Steps to create a log:
 - From command line, run: groovymame.exe -v romname >romname.txt
 - Attach resulting romname.txt file to your post, instead of pasting it.

CRT Emudriver, VMMaker & Arcade OSD downloads, documentation and discussion:  Eiusdemmodi