Software Support > GroovyMAME
The input lag issue in the context of emulation [about new -frame_delay option]
Calamity:
--- Quote from: Dr.Venom on November 14, 2012, 05:31:01 pm ---In the Anandtech article "Triple Buffering: Why We Love It", which you know I'm certain, is some interesting information on the flipqueuesize settings in the ad-hoc added 'UPDATE' (at the end of the article) , of which I highlighted the parts that I think are related to the things we discussed. Intuitively I'd say this seems quite close to the truth about the matter.
--- End quote ---
I read that article long ago and honestly I'm not sure if that UPDATE was already there, but YES, definitely that's the answer, at least for the part concerning the fake nature of the triple buffering implementation by DirectX. That explanation completely matches my experience with DirectDraw's flipping functions. So the confirmation that DirectX's triple buffer is a just queue is enough to avoid using it.
It's funny how once you get a new direction you find hundreds of Google references to this fact ;D
Actually they seem to have fixed this behaviour in newer versions of DirectX, so:
http://msdn.microsoft.com/en-us/library/windows/desktop/bb172585%28v=vs.85%29.aspx
--- Quote ---
D3DPRESENT_FORCEIMMEDIATE
D3DPRESENT_INTERVAL_IMMEDIATE is enforced on this Present call. This flag can only be specified when using D3DSWAPEFFECT_FLIPEX. Windowed and fullscreen presentation behaviors are the same. This is especially useful for media apps that want to discard frames that have been detected as late and present subsequent frames at composition time. An invalid parameter error will be returned if this flag is improperly specified. When multiple consecutive frames with D3DPRESENT_FORCEIMMEDIATEs are queued, only the last frame is displayed, for both windowed and fullscreen presentation. A sample application that uses D3DPRESENT_FORCEIMMEDIATE and D3DSWAPEFFECT_FLIPEX is the D3D9ExFlipEx sample on the MSDN Code Gallery.
This flag is available in Direct3D 9Ex on Windows 7 or later operating systems.
When using D3DSWAPEFFECT_FLIPEX, each frame presented using D3DPRESENT_INTERVAL_IMMEDIATE or D3DPRESENT_INTERVAL_FORCEIMMEDIATE will override the previous frame's present interval. For example, if you queue the following frames using the following swap effects: frame A (D3DPRESENT_INTERVAL_ONE), frame B(D3DPRESENT_INTERVAL_ONE), frame C(D3DPRESENT_INTERVAL_ONE), frame D(D3DPRESENT_INTERVAL_FORCEIMMEDIATE), frame D will override frame C's present interval. The displayed frames per present interval are frame A, frame B, (frame C overridden by) frame D.
--- End quote ---
Unfortunately we don't have this for Windows XP :angry:
Dr.Venom:
--- Quote from: Calamity on November 15, 2012, 05:04:50 pm ---Yeah that would be quite feasible, through some clever modification of the throttling function. I can imagine it could be cool adding a slider control to adjust how 'late' within the frame period you want the emulation to start, so the user could optimize this feature depending on the game and host cpu. As you pointed, the time it takes to emulate individual frames of a game is very uneven, and unfortunately you cannot know at first hand how long it will take to emulate a frame so you need to find the safe point where no retrace is missed. I'm definitely going to try and implement this as soon as I can.
--- End quote ---
That would be very cool :)
I can imagine it would be useful to have a configuration parameter that is not too granular, but provide say a few steps for lowering the "frame delay". Say a setting of 0 would equal emulate+blit in vblank (near holy grail), 1 a quarter frame delay, each next adds a quarter frame delay, with setting 4 equal to "emulate at beginning of frame" (the most safe setting, equal to the current implementation).
This would keep things simple, without triggering people to try and optimize for the millisecond. That could possibly result in a false sense of accuracy all together, because of the nature of the multitasking OS.
--- Quote ---It's only that I find input lag a very elusive matter, so I'm not probably the best to test. You may create a very complicated piece of code just to find you can't notice any difference.
--- End quote ---
It can be a very elusive matter. In my experience the bigger differences can be noticed by explicit testing for it. You'll mostly notice those by going back and forth between new and old in a short period of time. But, IMHO, for noticing the more subtle improvements a different approach is needed. For those you mostly need to play a fast shoot 'm up - that you know by heart - for a while a few times. Then let it rest. Then go back to the old method, and play for a longer time. Then let it rest. Mostly in the course of a day, or a few days, you'll get a sense of the "swiftness" of new versus old.
Of course, this presupposes that you have accurate material to test with, so using a -wired- joystick/joypad that is by itself accurate is essential. Using one of these chinese joypad adapters (for connecting PS2/SNES joypad etc.. to PC) are no go, as they all run at 100Hz or worse, causing a 10ms delay by itself (added to the 8ms in windows), and being a source of too much "noise" to do proper tests on the software. I myself am using a Suzo "The Arcade" digital joystick, with an adapter that runs at 1000hz (1ms), which negates any (additional) delay from the hardware side.
Second it presupposes, that you have the software environment set up properly, so for example testing in a window on Vista/7 with "Aero" enabled is no go. Or testing with the flipqueuesize at the video driver / windows default is no go. Etc. Once both hardware and software setup are appropriate, and as such the usual sources of lag have been elimated, only then it's possible to do adequate testing.
I guess in addition to proper and extensive testing, it would help to get some statistics from the software\emulation itself. It would as such be extremely helpful if it would be possible to keep a counter running within the emulation that 1) logs the average time between start of frame emulate until vblank and 2) logs the number of instances where vblank is missed / a frame has been lost (ofcourse these should be near zero in a proper test). Combining these statistics with the above mentioned testing methods, should give enough accuracy and certainty on whether a new method provides an improvement.
--- Quote ---
--- Quote ---It seems one of those undocumented areas (once again). Personally I'm not sure whether the minimum value of 2 makes sense. The equivalent of flipqueuesize on the NVidia driver (max frames to render ahead) is known to officially have been supporting values of 0-8 in the older drivers and 1-4 in the newer drivers.
--- End quote ---
I'm not sure either :) Just thinking of some possibilities. We don't know if that queue is just appended the one created by the programmer, which would be catastrophic (say I code a triple buffer which ends up being a 3+2 = 5 buffer chain!), or on the other hand they're just forcing a minimum of 2 (double buffering), so 3 would be 3 after all.
--- End quote ---
True. You would imagine it's the last option. But on the other it's strange that a driver limits the value to a minimum of two, while Windows allows for a lower setting. But then again, the way this (the driver) works in both WindowsXP and Windows7+ might be quite different all together. At least as you pointed out, they're already using a different driver model, so that might indeed include a different approach on the whole flipqueuesize thing.
--- Quote from: Calamity on November 15, 2012, 06:17:03 pm ---
--- Quote from: Dr.Venom on November 14, 2012, 05:31:01 pm ---In the Anandtech article "Triple Buffering: Why We Love It" [...]
--- End quote ---
I read that article long ago and honestly I'm not sure if that UPDATE was already there, but YES, definitely that's the answer, at least for the part concerning the fake nature of the triple buffering implementation by DirectX. That explanation completely matches my experience with DirectDraw's flipping functions. So the confirmation that DirectX's triple buffer is a just queue is enough to avoid using it.
It's funny how once you get a new direction you find hundreds of Google references to this fact ;D
--- End quote ---
That sounds familiar ;D
--- Quote ---Actually they seem to have fixed this behaviour in newer versions of DirectX, so:
http://msdn.microsoft.com/en-us/library/windows/desktop/bb172585%28v=vs.85%29.aspx
--- Quote ---D3DPRESENT_FORCEIMMEDIATE[...]
This flag is available in Direct3D 9Ex on Windows 7 or later operating systems.
Unfortunately we don't have this for Windows XP :angry:
--- End quote ---
--- End quote ---
That's indeed unfortunate.
On the other had, it's fortunate that Microsoft has been addressing and improving these issues in Windows Vista/7/8. In that regards I noticed two interesting other things also.
In Windows 7 you can call a function SetMaximumFrameLatency which
--- Quote ---Sets the number of frames that the system is allowed to queue for rendering. [...] The maximum number of back buffer frames that a driver can queue. The value defaults to 3, but can range from 1 to 16."
SetMaximumFrameLatency: http://msdn.microsoft.com/en-us/library/windows/desktop/ff471334%28v=vs.85%29.aspx
GetMaximumFrameLatency: http://msdn.microsoft.com/en-us/library/windows/desktop/ff471332(v=vs.85).aspx
--- End quote ---
Notice how the lowest value that can be forced is 1.
Another improvement that has been added to Windows 7 is on the audio front:
--- Quote ---The following features have been improved in Windows 7:
In Windows 7 share mode streams run in low-latency mode. The audio engine runs in pull mode with a significant reduction in latency. This is very useful for communication applications that require low audio stream latency for faster streaming."
http://msdn.microsoft.com/en-us/library/windows/desktop/dd756612%28v=vs.85%29.aspx
--- End quote ---
Even better, there's also the addition of exclusive-mode streaming, which allows for very low latency streams called "Pro-Audio", see : http://msdn.microsoft.com/en-us/library/windows/desktop/dd370844%28v=vs.85%29.aspx
which to me sounds ideal (pun intented ;) ) for the purpose of emulation and getting audio latency as low as possible.
All of the above says to me that Windows 7 is possibly as good (if not better?) an alternative to WindowsXP as an emulation platform? The only thing you have to be *very* aware of is knowing about Aero and how to disable it when running emulation in a window, lowering the flipqueuesize setting in general (which defaults to 3 in Win7 because of Aero), and - not the least - use an emulator that actually makes use of these improved rendering possibilities...
A heated debate I know, but with the above "guidelines" in mind, we should probably be more open minded on Windows 7 as a good platform for emulation? Well I'm already, so I'm biased...
Out of interest, would it be possible to develop CRT emudriver for the Win7+ platform, or are there specific things about WindowsXP that are needed for it?
jimmy2x2x:
:burgerking: Say Black Dynamite
:afro: Hush now burgerking, don't interrupt my kung-fu
:burgerking: Sorry to interrupt your kung-fu, but...
:burgerking: Could Dr.Venom be Dr(i)Ve(r)m(an)
:afro: DYNA-MITE, DYNA-MITE!
Dr.Venom:
--- Quote from: Dr.Venom on November 17, 2012, 06:47:40 am ---
--- Quote from: Calamity on November 15, 2012, 05:04:50 pm ---
--- Quote ---It's only that I find input lag a very elusive matter, so I'm not probably the best to test. You may create a very complicated piece of code just to find you can't notice any difference.
--- End quote ---
It can be a very elusive matter. In my experience the bigger differences can be noticed by explicit testing for it. You'll mostly notice those by going back and forth between new and old in a short period of time. But, IMHO, for noticing the more subtle improvements a different approach is needed. For those you mostly need to play a fast shoot 'm up - that you know by heart - for a while a few times. Then let it rest. Then go back to the old method, and play for a longer time. Then let it rest. Mostly in the course of a day, or a few days, you'll get a sense of the "swiftness" of new versus old.
Of course, this presupposes that you have accurate material to test with, so using a -wired- joystick/joypad that is by itself accurate is essential. Using one of these chinese joypad adapters (for connecting PS2/SNES joypad etc.. to PC) are no go, as they all run at 100Hz or worse, causing a 10ms delay by itself (added to the 8ms in windows), and being a source of too much "noise" to do proper tests on the software. I myself am using a Suzo "The Arcade" digital joystick, with an adapter that runs at 1000hz (1ms), which negates any (additional) delay from the hardware side.
Second it presupposes, that you have the software environment set up properly, so for example testing in a window on Vista/7 with "Aero" enabled is no go. Or testing with the flipqueuesize at the video driver / windows default is no go. Etc. Once both hardware and software setup are appropriate, and as such the usual sources of lag have been elimated, only then it's possible to do adequate testing.
I guess in addition to proper and extensive testing, it would help to get some statistics from the software\emulation itself. It would as such be extremely helpful if it would be possible to keep a counter running within the emulation that 1) logs the average time between start of frame emulate until vblank and 2) logs the number of instances where vblank is missed / a frame has been lost (ofcourse these should be near zero in a proper test). Combining these statistics with the above mentioned testing methods, should give enough accuracy and certainty on whether a new method provides an improvement.
--- End quote ---
--- End quote ---
I guess the above quote (from my previous post), to make a long story short, is simply saying that I'd be willing to test any improvements :D
I had another thought on the matter to fully objectively and accurately test for input latency. Not sure if and how it would exactly work, but the idea would be as follows.
Connect a Photodiode (http://en.wikipedia.org/wiki/Photodiode) to a joystick/joypad button. Preferably via wires, so that the photodiode can be attached directly to the glass of the CRT screen. Additionally it would require a sample test program running in the emulation that flashes a single frame from black to white and back (photodiode converts the light into current, triggering "active" button signal on joystick), allowing the test program to objectively and accurately measure the time between the flipping/blitting command for the single white frame, and the time the input signal is received. Could possibly be an interesting path to research if and when we would want to get to the bottom of input latency, in a scientific way.
Calamity:
So I've been doing some tests these days, and the new option named -frame_delay is going to be ready for the next release. I've done it so that a frame time is divided in 10 parts, so a frame_delay value of 0 (default) means the emulation starts at the beginning of the frame time, as always. A value of 5 means the emulation is postponed to the middle of the frame, and so on (you have 1 tenth of a frame of granularity). I *think* I've done it right and has been working for me, however I need to test it more thoroughly. I have to admit that I can't notice any difference myself.
Second, I've removed the third buffer in -triplebuffer, so now it can be used as an asynchronous implementation of double buffering removing the extra frame in the queue.
PD: I leave the Win7 vs XP / input latency measurement issues for later posts...
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version