Software Support > GroovyMAME

The input lag issue in the context of emulation [about new -frame_delay option]

<< < (4/7) > >>

Dr.Venom:

--- Quote from: Calamity on November 12, 2012, 02:56:14 pm ---
--- Quote from: Dr.Venom on November 11, 2012, 06:32:13 pm ---Ah yes, I forgot to mention that the Radeon Pro utility -only- works with 32-bit applications. It uses some kind of "hook" system, that simply doesn't work for 64 bit applications. It's one of the reasons I'm compiling most of the emulator stuff as 32-bit applications.

Whether or not RP is really applying your settings is shown by the taskbar status icon, like in the image below.
--- End quote ---

I did some research on this "Flip Queue Size" thing. Well, it seems it's controlled by a registry key named FlipQueueSize, so it should work without the need of any utility. This key is read by the ati3duag.dll file. I dug in the disassembly and found this:


--- Code: ---.text:00015ACB                 push    offset aFlipqueuesize ; "FlipQueueSize"
.text:00015AD0                 call    sub_39AA0
.text:00015AD5                 mov     eax, [esi+4]
.text:00015AD8                 cmp     eax, 0Ah
.text:00015ADB                 jbe     short loc_15AE6
.text:00015ADD                 mov     dword ptr [esi+4], 0Ah
.text:00015AE4                 jmp     short loc_15AF2
.text:00015AE6 ; ---------------------------------------------------------------------------
.text:00015AE6
.text:00015AE6 loc_15AE6:                              ; CODE XREF: .text:00015ADBj
.text:00015AE6                 cmp     eax, 2
.text:00015AE9                 jnb     short loc_15AF2
.text:00015AEB                 mov     dword ptr [esi+4], 2
.text:00015AF2
.text:00015AF2 loc_15AF2:                              ; CODE XREF: .text:00015AE4j
.text:00015AF2                                         ; .text:00015AE9j
.text:00015AF2                 mov     eax, [esi+4]

--- End code ---

It's interesting because it shows that the minimum value allowed is 2. BTW this is from Catalyst 9.3
--- End quote ---

For my personal interest and for the sake of science (I'll post back here), I would like to check the above for my Windows 7 Catalyst 12_6 Legacy driver. Would this application http://www.reflector.net/ be suitable for decompiling the mentioned ati3duag.DLL? Or is there maybe another (hopefully free) application that you could recommend? Thanks..

Calamity:

--- Quote from: Dr.Venom on November 14, 2012, 01:56:22 pm ---For my personal interest and for the sake of science (I'll post back here), I would like to check the above for my Windows 7 Catalyst 12_6 Legacy driver. Would this application http://www.reflector.net/ be suitable for decompiling the mentioned ati3duag.DLL? Or is there maybe another (hopefully free) application that you could recommend? Thanks..

--- End quote ---

Hex-rays made IDA 5.0 free for non-commercial use, it's the tool that I use:

http://www.hex-rays.com/products/ida/support/download_freeware.shtml

Keep in mind that a minimum value of 2 possibly makes sense as you need at least 2 elements in order to have a queue.

Dr.Venom:

--- Quote from: Calamity on November 14, 2012, 02:27:23 pm ---Hex-rays made IDA 5.0 free for non-commercial use, it's the tool that I use:

http://www.hex-rays.com/products/ida/support/download_freeware.shtml
--- End quote ---

Thanks.

I've been looking for the ati3duag.dll in the Windows7 (64-bit) 12_6 Catalyst drivers, but guess what, that file is not a part of the driver anymore. ati2edxx.dll and ati2erec.dll, are the only ati# dll files in the whole package. I guess some parts must have been changed from 9_3 to 12_6. Nonetheless thanks for pointing me to Hex-rays, I'm sure I'll be making use of it sooner or later.


--- Quote ---Keep in mind that a minimum value of 2 possibly makes sense as you need at least 2 elements in order to have a queue.
--- End quote ---

It seems one of those undocumented areas (once again). Personally I'm not sure whether the minimum value of 2 makes sense. The equivalent of flipqueuesize on the NVidia driver (max frames to render ahead) is known to officially have been supporting values of 0-8 in the older drivers and 1-4 in the newer drivers.

Just for if you still have the time and energy (after our earlier discussion), I found some interesting bits on it from a trusted source.

In the Anandtech article "Triple Buffering: Why We Love It", which you know I'm certain, is some interesting information on the flipqueuesize settings in the ad-hoc added 'UPDATE' (at the end of the article) , of which I highlighted the parts that I think are related to the things we discussed. Intuitively I'd say this seems quite close to the truth about the matter.

http://www.anandtech.com/show/2794/4


--- Quote ---UPDATE: There has been a lot of discussion in the comments of the differences between the page flipping method we are discussing in this article and implementations of a render ahead queue. In render ahead, frames cannot be dropped. This means that when the queue is full, what is displayed can have a lot more lag. Microsoft doesn't implement triple buffering in DirectX, they implement render ahead (from 0 to 8 frames with 3 being the default).

The major difference in the technique we've described here is the ability to drop frames when they are outdated. Render ahead forces older frames to be displayed. Queues can help smoothness and stuttering as a few really quick frames followed by a slow frame end up being evened out and spread over more frames. But the price you pay is in lag (the more frames in the queue, the longer it takes to empty the queue and the older the frames are that are displayed).

In order to maintain smoothness and reduce lag, it is possible to hold on to a limited number of frames in case they are needed but to drop them if they are not (if they get too old). This requires a little more intelligent management of already rendered frames and goes a bit beyond the scope of this article.

Some game developers implement a short render ahead queue and call it triple buffering (because it uses three total buffers). They certainly cannot be faulted for this, as there has been a lot of confusion on the subject and under certain circumstances this setup will perform the same as triple buffering as we have described it (but definitely not when framerate is higher than refresh rate).

Both techniques allow the graphics card to continue doing work while waiting for a vertical refresh when one frame is already completed. When using double buffering (and no render queue), while vertical sync is enabled, after one frame is completed nothing else can be rendered out which can cause stalling and degrade actual performance.

When vsync is not enabled, nothing more than double buffering is needed for performance, but a render queue can still be used to smooth framerate if it requires a few old frames to be kept around. This can keep instantaneous framerate from dipping in some cases, but will (even with double buffering and vsync disabled) add lag and input latency.
--- End quote ---

Their conclusions confirm (IMO) that the flipqueuesize is useful for adding smoothness, but that it does comes at the price of adding latency. It also confirms that the Microsoft implementation allows for a setting of 0-8, which sort of seem orthogonal to the 9_3 driver limiting the value at 2? (But maybe I'm missing something.)

Interesting stuff at least, especially when the goal is to achieve an (almost) lagless implementation for emulation.

Calamity:

--- Quote from: Dr.Venom on November 12, 2012, 07:28:22 pm ---A sort of easy "patch" in the meantime, until PC's get more powerful, would be to "burst" emulate each frame starting at the middle of the real frame, instead of the beginning of the frame (as mame does it now I understand). That would then lower the required speed to 200% of the original machine's its speed, something that most modern PC's should be able to handle for many of the emulated systems. But then again, until PC's get much more powerful it's probably safer to start at the beginning of the frame as MAME does it now, so that the chance of a frame being skipped by missing vblank is the lowest for a wide range of PC's.
--- End quote ---

Yeah that would be quite feasible, through some clever modification of the throttling function. I can imagine it could be cool adding a slider control to adjust how 'late' within the frame period you want the emulation to start, so the user could optimize this feature depending on the game and host cpu. As you pointed, the time it takes to emulate individual frames of a game is very uneven,  and unfortunately you cannot know at first hand how long it will take to emulate a frame so you need to find the safe point where no retrace is missed. I'm definitely going to try and implement this as soon as I can.

It's only that I find input lag a very elusive matter, so I'm not probably the best to test. You may create a very complicated piece of code just to find you can't notice any difference.


--- Quote ---Great that you've been digging deeper into this. Could you post the registry key path in which you find this specific key? (A search in my Win7 64-bit registry didn't reveal the key.)

--- End quote ---

Oh the value is not supposed to exist unless some of these tweaking apps adds it, its named FlipQueueSize and should reside in the same key where the driver stores its variables (same place where we add the modelines).


--- Quote ---That's some very cool digging :) and definitely interesting. Could it be that the RadeonPro tool patches this value on runtime? I remember someone "proving" somewhere that the flipqueuesize got changed adequately by the RadeonPro tool, but I can't remember when/where I read this. I would also be very much interested if you get any additional findings on this matter.

--- End quote ---

I seriously doubt that RadeonPro is patching this on runtime, maybe it's using a hook to intercept some stuff but I bet that for the flip queue they just use the registry key, anyway I can't say that for sure.

Calamity:

--- Quote from: Dr.Venom on November 14, 2012, 05:31:01 pm ---I've been looking for the ati3duag.dll in the Windows7 (64-bit) 12_6 Catalyst drivers, but guess what, that file is not a part of the driver anymore. ati2edxx.dll and ati2erec.dll, are the only ati# dll files in the whole package. I guess some parts must have been changed from 9_3 to 12_6. Nonetheless thanks for pointing me to Hex-rays, I'm sure I'll be making use of it sooner or later.

--- End quote ---

Apart from the Catalyst version, W7 uses a new driver model, different from XP's.


--- Quote ---It seems one of those undocumented areas (once again). Personally I'm not sure whether the minimum value of 2 makes sense. The equivalent of flipqueuesize on the NVidia driver (max frames to render ahead) is known to officially have been supporting values of 0-8 in the older drivers and 1-4 in the newer drivers.

--- End quote ---

I'm not sure either :) Just thinking of some possibilities. We don't know if that queue is just appended the one created by the programmer, which would be catastrophic (say I code a triple buffer which ends up being a 3+2 = 5 buffer chain!), or on the other hand they're just forcing a minimum of 2 (double buffering), so 3 would be 3 after all.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version