Software Support > GroovyMAME
The input lag issue in the context of emulation [about new -frame_delay option]
Dr.Venom:
--- Quote from: Calamity on November 09, 2012, 01:52:47 pm ---Thank you for your elaborated answer, it's a pleasure to discuss with you.
--- End quote ---
Likewise :)
--- Quote ---Well, this is what we know as double buffering. This is actually what you'd get by compiling the suggested patch. You have two buffers:
- buffer #1: the visible VRAM being transferred to the screen*
- buffer #2: the back buffer where you render to.
--- Quote ---What's also becoming clear from this model is that either form of double or triple buffering will simply break the "holy grail", as it will always cause - at least - the problem of one frame of additional "input lag" (actually the video is delayed, but it is perceived as input lag)
--- End quote ---
Indeed, but it's not the fact of having 2 or more buffers what adds a frame of lag as one would think, it's the very concept of "frame-based" emulation what causes this. The reason behind this is that transferring the contents of the VRAM to the screen (*) is actually a process that consumes time too (17 ms), as the raster travels through the screen, so once you "flip in vblank" you need to wait some time to see the whole frame displayed, but in the meanwhile there's in new frame being cooked that won't contain your reactions to what's happening on the screen.
--- End quote ---
Thanks for explaining, I see your point. Given your explanation, aren't we actually talking a -two- frame delay with this method? So because frame emulation is not directly attached to the copy/flip in vblank (but done way before that) and will only take a fraction of the frame (rest is mostly waiting for vblank) you get approximately one frame delay; add to this the one frame of delay because of the whole concept of "frame based" emulation, don't we end up with two frames of delay?
So to visualize (example case where there's an input event mid frame):
--- Code: ---0ms ----------------display------------15.4ms -vblank- 16.7ms||0ms ----------------display----------15.4ms -vblank- 16.7ms||0ms ----------------display----------15.4ms -vblank- 16.7ms||
||...emulate | -------> wait sync -----> blit ----> emulate..||...emulate | -------> wait sync -----> blit ----> emulate..||...emulate | -------> wait sync -----> blit ----> emulate..||
||<--------------- (x) user input ------------------------->||<----------------- (x) not shown ------------------------>||<----------------- (x) = shown! ------------------------> ||
--- End code ---
--- Quote ---But even if we used a single buffer, which is certainly possible for a fast nowadays' computer, so we would directly render everything into the visible VRAM during the VBLANK time without previous buffering, we would be running in the same 1-frame-of-lag issue(*), as long as our emulator design is frame-based.
--- End quote ---
(*)I guess this is not entirely the same 1-frame-of-lag issue as the double buffer (there are two there), but isn't this exclusively the minimum delay (1-frame) that is achievable when dealing with frame based emulation?
So to visualize the input delay in the case of "emulate+blit in vblank" :
--- Code: ---0ms ----------------display------------15.4ms -vblank- 16.7ms||0ms ----------------display------------15.4ms -vblank- 16.7ms||
||--------------------------------------| emulate + blit.....||----------------------------------------| emulate + blit.....||
||<--------------- (x) user input ------------------------->||<----------------- (x) = shown! --------------------------->||
--- End code ---
Isn't this then the one and only "holy grail" when talking about frame based emulation? If so, would it be an idea to add this as a say "accurate" = yes/no option to groovymame? That would be wonderful! :) I guess it would probably need to be a separate option, because not for all MAME/MESS systems the emulation + blit/copy can probably be done within the vertical blank? But imagine if it would work for the "old boys" like Genesis/SNES/MSX2/Colecovision/C64/etc etc.. :)
--- Quote ---On a different plane of things, we have to consider how the input is polled. In an event driven OS like Windows we don't poll input continuously. The system will send us a message when some new input happens, these messages will get buffered and we usually read them once per frame. Now, this model should be good enough, leaving apart the built-in system input lag that in theory should be possible to get reduced to a minimum as hardware improves.
--- End quote ---
Thanks, makes it clear. There is however one question that I'm wondering about. I read all of these stories about the HID USB polling rate in windows being 125hz, i.e. polling about every 8ms, and people trying to overclock the USB ports (through USBPORT/HIDUSB patches) to 250/500/1000hz. I'm not sure what to believe of this all, and whether it's true for all versions of windows. But if there's some truth to it, then it would actually mean that input changes are only signalled 2 times a frame? If so, it would also raise the question whether or not something could be done about it from a software developers point of view?
--- Quote ---But due to the design of MAME, when vsync is enabled we can get some extra lag as the input remains locked during the wait for vsync, which is represented in the following scheme, as compared the GM case where this problem is solved:
--- Code: ---Vanilla MAME + vsync:
0ms --------------------------------------------------------15.4ms --- vertical blank -- 16.7ms
||...emulate frame (render to buffer) --> wait for sync ----> blit --> emulate next---...... ||
||<---------- input enabled ----------> <----- input locked ---------> <--- input enabled ---->
GroovyMAME + vsync + multithreading:
0ms --------------------------------------------------------15.4ms --- vertical blank -- 16.7ms
||...emulate frame (render to buffer) --> wait for sync ----> blit --> emulate next---...... ||
||<---------------------------------- input enabled ------------------------------------------>
--- End code ---
--- End quote ---
Very cool :)
--- Quote ---So now it's when emulator writers tell you that these are the limits of emulation. But I do believe that the "holly grail" of emulation is actually feasible in practice, understanding it as a piece of software that works as an *exact* substitution of the emulated hardware, in terms of response. It's only that, IMHO, the frame based concept would need to be replaced by a scanline based model, where only the next scanline is buffered and we use hsync instead of vsync for synchronizing.
Considering that emulator writers use flat panels, such an emulator is not likely going to see the light :)
--- End quote ---
I wholeheartedly agree, this would indeed be perfection. Re the emulator writers, I guess we need to donate them some CRT "3D" panels :)
--- Quote ---Now as MAME is designed to use the CPU clock for accurately timing of emulation it needs to be decoupled from the screen refresh but this leads to horrible tearing, so someone thought it would be a good idea to use the triple buffering model, and actually it is, if it wasn't for the fact that the DX's flip functions don't worked as advertised, i.e. creating a second back buffer doesn't result in asynchronous flipping (notice I mean asynchronous to the game loop, the flip is always synced to the vertical retrace).
This results in MAME's -triplebuffer option anchoring the game loop to the video card's refresh when this is lower than the desired speed, so the benefits of triple buffering don't apply here and we have only a sophisticated version of double buffering.
--- End quote ---
Thanks for explaining, and providing some more insight into how these things actually work.
Lastly, with regards to potential cause for display lag, you may or may not be familiar with this, but since we're on the topic I thought I'd just flag them.
First about the display model that is used in Windows Vista and 7. The Desktop Compositor Engine on which the Aero interface is built has its own vertical synchronization routines, which are known to possibly interfere with emulator vertical sync routines. I've encountered this with bsnes and some other emulators, where the (smooth) scrolling would show a hick-up every now and then. After disabling the desktop composition for such a program it ran flawlessly. So just in case you encounter weird things when testing GM stuff on W7..
Second issue may have an even greater effect. It's about the video drivers in Windows (from NVidia/AMD) that sort of seem to have a will of their own when it comes to buffering. The culprit in question is the so called "flip queue size" (ATI/AMD) or "Maximum Pre-rendered Frames" (Nvidia) variable in these drivers, which normally defaults to three. To my experience this value can be a cause for serious additional lag, especially when the emulator is intending to use the minimal amount of buffering.
The flip queue size for ATI/AMD cannot be configured by the Catalyst Control Center (why o why?). But luckily a solution was written in the form of the RadeonPro tool (http://www.radeonpro.info/en-US/), where you can change the flipqueue size setting per application between a setting of 5 to 0. I'm not an NVidia user, but apparently the Maximum pre-rendered frames can be set through the video control panel. I read that in the newer drivers the setting of 0 has been removed, and the lowest is 1. In my experience using a setting of 0 (versus the default of 3) can make a world of difference on most of the emulators.
I originally got triggered on this subject by the PC-Engine "Ootake" emulator author, the original topic can be found at the Ootake page here: http://www.ouma.jp/ootake/delay-win7vista.html. It's about lowering causes of input delay in Vista and 7, but halfway down the page it also mentions that the flip queue size settings have an effect in WindowsXP too (given modern enough PC).
Calamity:
--- Quote from: Dr.Venom on November 10, 2012, 09:22:50 am ---Thanks for explaining, I see your point. Given your explanation, aren't we actually talking a -two- frame delay with this method? So because frame emulation is not directly attached to the copy/flip in vblank (but done way before that) and will only take a fraction of the frame (rest is mostly waiting for vblank) you get approximately one frame delay; add to this the one frame of delay because of the whole concept of "frame based" emulation, don't we end up with two frames of delay?
--- End quote ---
Indeed. I meant that this model only represents 1 additional frame of delay with respect to the original hardware, which probably already worked with 1 frame of delay in most situations, as long as it was designed to poll input once per frame during vblank.
--- Quote ---So to visualize (example case where there's an input event mid frame):
--- Code: ---0ms ----------------display------------15.4ms -vblank- 16.7ms||0ms ----------------display----------15.4ms -vblank- 16.7ms||0ms ----------------display----------15.4ms -vblank- 16.7ms||
||...emulate | -------> wait sync -----> blit ----> emulate..||...emulate | -------> wait sync -----> blit ----> emulate..||...emulate | -------> wait sync -----> blit ----> emulate..||
||<--------------- (x) user input ------------------------->||<----------------- (x) not shown ------------------------>||<----------------- (x) = shown! ------------------------> ||
--- End code ---
--- End quote ---
Yes, this is exactly what's going on, provided the OS is fast enough to notify us the input within the current frame time.
--- Quote ---(*)I guess this is not entirely the same 1-frame-of-lag issue as the double buffer (there are two there), but isn't this exclusively the minimum delay (1-frame) that is achievable when dealing with frame based emulation?
--- End quote ---
Exactly, but what I had in mind when I wrote "single buffering" is not what you drew on your scheme below. By "fast computer" I meant fast enough for *compositing* the frame directly in vram during vblank, but I was not considering the whole emulation of it. Just wanted to prove that we have the same concept here even if no intermediate buffer exists.
But of course this:
--- Quote ---So to visualize the input delay in the case of "emulate+blit in vblank" :
--- Code: ---0ms ----------------display------------15.4ms -vblank- 16.7ms||0ms ----------------display------------15.4ms -vblank- 16.7ms||
||--------------------------------------| emulate + blit.....||----------------------------------------| emulate + blit.....||
||<--------------- (x) user input ------------------------->||<----------------- (x) = shown! --------------------------->||
--- End code ---
--- End quote ---
... is a completely different animal, and I agree with you it would be *nearly* the holy grail of emulation. This is probably the best we can get on a Windows-like OS as hardware gets faster. And probably could be considered perfect emulation for many systems.
However, many old systems were capable of running code during hblank, this was often used for changing video settings in order to create interesting effects, but probably some games could have also polled inputs during this period. We would need to check case by case and I don't know the details, but it's obvious that with the above scheme we would be missing this sub-frame precision (if that matters, that's another story).
I believe this scheme is not too difficult to achieve, though it would need some non-trivial reorganization of MAME rendering. However, I guess the CPU requirements would be very high:
16.67 / 1.33 = 12.50 x 100 = 1250%
... so in order to have fluent emulation of a 60 Hz game you'd need that MAME could emulate it at least at 1250%.
For truly perfect emulation we would need to emulate and render line by line, synchronizing to hblank instead of vblank. This is to avoid the need of pre-rendering a whole frame and allow us to read input at any point in the frame. This is feasible as video hblank triggers interrupts much like vblank, but unfortunately under Windows we don't have reliable access to this information, as far as I know. It's possible to read the current scanline so something could be done, but I doubt it would be accurate enough. The pros are that as we'd be spreading the emulation time during the whole frame time, a very modest PC could do. The contras: the emulators would possibly need a very complete rewrite. With your idea, on the other hand, the same basic emulation code will serve.
--- Quote ---Thanks, makes it clear. There is however one question that I'm wondering about. I read all of these stories about the HID USB polling rate in windows being 125hz, i.e. polling about every 8ms, and people trying to overclock the USB ports (through USBPORT/HIDUSB patches) to 250/500/1000hz. I'm not sure what to believe of this all, and whether it's true for all versions of windows. But if there's some truth to it, then it would actually mean that input changes are only signalled 2 times a frame? If so, it would also raise the question whether or not something could be done about it from a software developers point of view?
--- End quote ---
I'm sorry I don't have much information about this. I've read about this too here and there, but have never got into tweaking USB inputs. As far as I understand it, if we could poll the hardware *directly* once per frame in sync with vblank it should be enough, but I guess that even if we use DirectInput to poll the keyboard state this matrix will only get updated at the usb polling rate which is independent from us, so yes, in theory increasing the polling rate will improve our chances that the information returned by DirectInput is up-to-date when we read it.
--- Quote ---First about the display model that is used in Windows Vista and 7. The Desktop Compositor Engine on which the Aero interface is built has its own vertical synchronization routines, which are known to possibly interfere with emulator vertical sync routines. I've encountered this with bsnes and some other emulators, where the (smooth) scrolling would show a hick-up every now and then. After disabling the desktop composition for such a program it ran flawlessly. So just in case you encounter weird things when testing GM stuff on W7..
--- End quote ---
Yeah I had read about the Aero thing. Well, actually this buffering *should* be disabled while in full screen mode, if that's not the case then W7 should definitely not be an option for emulation. Anyway, BSNES does not work in full screen mode, it just runs at your desktop resolution if I remind right, so that could be the reason.
--- Quote ---Second issue may have an even greater effect. It's about the video drivers in Windows (from NVidia/AMD) that sort of seem to have a will of their own when it comes to buffering. The culprit in question is the so called "flip queue size" (ATI/AMD) or "Maximum Pre-rendered Frames" (Nvidia) variable in these drivers, which normally defaults to three. To my experience this value can be a cause for serious additional lag, especially when the emulator is intending to use the minimal amount of buffering.
--- End quote ---
Well this is something new to me, and it sounds like it could be a possible reason why triplebuffering, which uses flipping, has such a bad reputation. I doubt I've experienced that with XP + Catalyst but will definitely investigate it.
--- Quote ---I originally got triggered on this subject by the PC-Engine "Ootake" emulator author, the original topic can be found at the Ootake page here: http://www.ouma.jp/ootake/delay-win7vista.html. It's about lowering causes of input delay in Vista and 7, but halfway down the page it also mentions that the flip queue size settings have an effect in WindowsXP too (given modern enough PC).
--- End quote ---
Yeah I had read that article where the author explains the mechanism he uses for reducing input lag in his emulator, very inspiring! A good friend pointed it to me long ago.
Dr.Venom:
--- Quote ---But of course this:
--- Quote ---So to visualize the input delay in the case of "emulate+blit in vblank" :
--- Code: ---0ms ----------------display------------15.4ms -vblank- 16.7ms||0ms ----------------display------------15.4ms -vblank- 16.7ms||
||--------------------------------------| emulate + blit.....||----------------------------------------| emulate + blit.....||
||<--------------- (x) user input ------------------------->||<----------------- (x) = shown! --------------------------->||
--- End code ---
--- End quote ---
... is a completely different animal, and I agree with you it would be *nearly* the holy grail of emulation. This is probably the best we can get on a Windows-like OS as hardware gets faster. And probably could be considered perfect emulation for many systems.
--- End quote ---
That is great, and leaves some promises at least for the future of frame based emulation.
--- Quote ---However, many old systems were capable of running code during hblank, this was often used for changing video settings in order to create interesting effects, but probably some games could have also polled inputs during this period. We would need to check case by case and I don't know the details, but it's obvious that with the above scheme we would be missing this sub-frame precision (if that matters, that's another story).
I believe this scheme is not too difficult to achieve, though it would need some non-trivial reorganization of MAME rendering. However, I guess the CPU requirements would be very high:
16.67 / 1.33 = 12.50 x 100 = 1250%
... so in order to have fluent emulation of a 60 Hz game you'd need that MAME could emulate it at least at 1250%.
--- End quote ---
I don't fully understand your calculation, would be great if you could elaborate a bit on this..
--- Quote ---For truly perfect emulation we would need to emulate and render line by line, synchronizing to hblank instead of vblank. This is to avoid the need of pre-rendering a whole frame and allow us to read input at any point in the frame. This is feasible as video hblank triggers interrupts much like vblank, but unfortunately under Windows we don't have reliable access to this information, as far as I know. It's possible to read the current scanline so something could be done, but I doubt it would be accurate enough. The pros are that as we'd be spreading the emulation time during the whole frame time, a very modest PC could do. The contras: the emulators would possibly need a very complete rewrite. With your idea, on the other hand, the same basic emulation code will serve.
--- End quote ---
I definately agree with your reasoning for using hblank for truly perfect emulation, and the pros and contras you mention of using the various methods. While contemplating what you wrote another idea popped into my mind, that combines sort of the things we spoke about. I'm thinking of a method that *could* very much be a close approximation of a line by line sync, while possibly only making "modest" changes to the emulator core.
Biggest question would be if it is currently possible to PAUSE/START the MAME emulation core at will -multiple times- during a frame with only a -modest- change to the code?
If that's possible, then the following model should be possible (in theory for now at least):
* Chop the frame emulation of the core in N chunks of visible lines (by using "pause/start"), with each N chunk consisting of [1/N * total visible scanlines]
* spread the N chunks over the realworld frame time by using PAUSE/WAIT/START after emulating each chunk
* in parallel use D3DRASTER_STATUS to read where the real monitor scanline is approximately, and continually make sure to blit ahead (with some margin) the next chunk
So suppose we're running an NTSC screen with 240 visible lines and 262 total lines, we're chopping the frame in *3* chunks (80 lines each) + vsync, then it would look something like this:
--- Code: ---real line nr. -> 240---------------------------------------------262/0--------------------------------------------------80-------------------------------------------------160-------------------------------------------------240(wrap)
real display (>front buf)-> |-----------------REAL VBLANK---------------------| display of chunk *1* (lines 0-80)---------------->| display of chunk *2* (lines 81-160)--------------->| display of chunk *3* (lines 161-240)------------->||
emu core (>back buf) -> | emu chunk *1* (lines 0-80) -> pause+blit+wait-->| emu chunk *2* (lines 81-160) -> pause+blit+wait-->| emu chunk *3* (lines 160-240)-> pause+blit+wait--->| ---------> wrap when next frame------------------>||
D3DRASTER_STATUS -> | poll/wait for real display line 0 ------------->| poll/wait for real display line 80--------------->| poll/wait for real display line 160--------------->| poll/wait for real display line 240 (vblank)----->||
input polling -> |-------------------input enabled---------------->| --------------------input enabled---------------->| --------------------input enabled----------------->|--------------------input enabled----------------->||
--- End code ---
Advantages:
- The maximum amount of lag versus a real system would be in the order of magnitude of *only* 1/3 of a frame!
- Basic emulation code would serve, -if- it's possible to PAUSE/START the main emulation thread? (Only modest adjustment to code base?)
- Spreads emulation time over the real frame time: a relatively modest PC could do?
- Could be an approximation for sub frame precision?
I guess the above is sort of my my final (high level, I admit) thought on getting to perfection within the frame based emulation core, given the information that has come forward from our (very nice and useful) discussions. Hopefully the above comes across and actually could make some sense from real coding perspective. And if so, it might be improved further? (I guess that would be a yes :) )
--- Quote ---I'm sorry I don't have much information about this. I've read about this too here and there, but have never got into tweaking USB inputs. As far as I understand it, if we could poll the hardware *directly* once per frame in sync with vblank it should be enough, but I guess that even if we use DirectInput to poll the keyboard state this matrix will only get updated at the usb polling rate which is independent from us, so yes, in theory increasing the polling rate will improve our chances that the information returned by DirectInput is up-to-date when we read it.
--- End quote ---
In theory it should work indeed, but unfortunately I'm not finding any "hard" evidence on the topic. I wish there would be some official Microsoft spec sheets on how these things are actually implemented. So that we would know what the default rates are in the different versions of windows, and whether/how they apply to different HID's, like mouse, keyboard and joypad/sticks. Unfortunately this seems rather hard to come by.
--- Quote ---Yeah I had read about the Aero thing. Well, actually this buffering *should* be disabled while in full screen mode, if that's not the case then W7 should definitely not be an option for emulation. Anyway, BSNES does not work in full screen mode, it just runs at your desktop resolution if I remind right, so that could be the reason.
--- End quote ---
Yes you're quite right, bsnes runs in a full screen window, so that's why it's affected by the WDM. Shouldn't indeed be the case with real fullscreen applications.
--- Quote ---
--- Quote ---Second issue may have an even greater effect. It's about the video drivers in Windows (from NVidia/AMD) that sort of seem to have a will of their own when it comes to buffering. The culprit in question is the so called "flip queue size" (ATI/AMD) or "Maximum Pre-rendered Frames" (Nvidia) variable in these drivers, which normally defaults to three. To my experience this value can be a cause for serious additional lag, especially when the emulator is intending to use the minimal amount of buffering.
--- End quote ---
Well this is something new to me, and it sounds like it could be a possible reason why triplebuffering, which uses flipping, has such a bad reputation. I doubt I've experienced that with XP + Catalyst but will definitely investigate it.
--- End quote ---
Ah yes, I forgot to mention that the Radeon Pro utility -only- works with 32-bit applications. It uses some kind of "hook" system, that simply doesn't work for 64 bit applications. It's one of the reasons I'm compiling most of the emulator stuff as 32-bit applications.
Whether or not RP is really applying your settings is shown by the taskbar status icon, like in the image below.
Lastly, regarding compiling GroovyMAME/UME with the double buffer patch you mentioned earlier. I tried compiling the 146 + u releases again, but now by having installed an older MinGW-MAME distribution, but it still gives me compilation errors :(. I found your suggestion in another forumpost, where you suggested to use Compile MAME 64 v1.22. Now I've been searching my head off for this util, but I could only find a v1.23 version , which (unfortunately) is already updated for the new toolchain. Do you have any other tips regarding this, or could the v1.22 version be put up shortly somewhere? Otherwise I guess I'll have to wait until the new patch comes out. (Which isn't that big a problem, but I thought I'd just ask.)
Calamity:
--- Quote from: Dr.Venom on November 11, 2012, 06:32:13 pm ---I don't fully understand your calculation, would be great if you could elaborate a bit on this..
--- End quote ---
Well, it's just a rough calculation, based in an average duration of VBLANK around 1.33 ms for a 15 kHz CRT, you get that in order to fit one complete emulation cycle in the VBLANK period you need the emulator to run at least at:
16.67 / 1.33 = 12.50 times faster than the original machine did (x 100 = 1250% as MAME expresses it).
So unless a certain game runs at 1250% unthrottled on your hardware it won't be possible to get emulated inside VBLANK.
--- Quote from: Dr.Venom on November 11, 2012, 06:32:13 pm ---Biggest question would be if it is currently possible to PAUSE/START the MAME emulation core at will -multiple times- during a frame with only a -modest- change to the code?
--- End quote ---
That's the point. I'm afraid I have no clue of how the actual architecture of MAME works on the emulator side, or even if there's a general way of doing things common to all systems. However I would be deeply surprised if this a PAUSE/START thing could be implemented through modest changes.
--- Quote ---
--- Code: ---real line nr. -> 240---------------------------------------------262/0--------------------------------------------------80-------------------------------------------------160-------------------------------------------------240(wrap)
real display (>front buf)-> |-----------------REAL VBLANK---------------------| display of chunk *1* (lines 0-80)---------------->| display of chunk *2* (lines 81-160)--------------->| display of chunk *3* (lines 161-240)------------->||
emu core (>back buf) -> | emu chunk *1* (lines 0-80) -> pause+blit+wait-->| emu chunk *2* (lines 81-160) -> pause+blit+wait-->| emu chunk *3* (lines 160-240)-> pause+blit+wait--->| ---------> wrap when next frame------------------>||
D3DRASTER_STATUS -> | poll/wait for real display line 0 ------------->| poll/wait for real display line 80--------------->| poll/wait for real display line 160--------------->| poll/wait for real display line 240 (vblank)----->||
input polling -> |-------------------input enabled---------------->| --------------------input enabled---------------->| --------------------input enabled----------------->|--------------------input enabled----------------->||
--- End code ---
--- End quote ---
This would definitely be awesome if it could be achieved. It reminds me of the method implemented by Ootake's author, however I think he didn't link the emulation of the different chunks to the actual scanlines. In any case it's a good thing if it serves to raise interest and awareness about this stuff among emulator writers.
--- Quote ---In theory it should work indeed, but unfortunately I'm not finding any "hard" evidence on the topic. I wish there would be some official Microsoft spec sheets on how these things are actually implemented. So that we would know what the default rates are in the different versions of windows, and whether/how they apply to different HID's, like mouse, keyboard and joypad/sticks. Unfortunately this seems rather hard to come by.
--- End quote ---
Of course for the above scheme to be worth the pain it should be paired with an almost real-time report of input events. Don't expect many spec sheets, it's like with custom video modes, this stuff is just beyond what's considered ortodox PC usage. What amazes me is that there's not much official concern that I know of being gamers one of the main targets of PC industry.
--- Quote ---Ah yes, I forgot to mention that the Radeon Pro utility -only- works with 32-bit applications. It uses some kind of "hook" system, that simply doesn't work for 64 bit applications. It's one of the reasons I'm compiling most of the emulator stuff as 32-bit applications.
Whether or not RP is really applying your settings is shown by the taskbar status icon, like in the image below.
--- End quote ---
I did some research on this "Flip Queue Size" thing. Well, it seems it's controlled by a registry key named FlipQueueSize, so it should work without the need of any utility. This key is read by the ati3duag.dll file. I dug in the disassembly and found this:
--- Code: ---.text:00015ACB push offset aFlipqueuesize ; "FlipQueueSize"
.text:00015AD0 call sub_39AA0
.text:00015AD5 mov eax, [esi+4]
.text:00015AD8 cmp eax, 0Ah
.text:00015ADB jbe short loc_15AE6
.text:00015ADD mov dword ptr [esi+4], 0Ah
.text:00015AE4 jmp short loc_15AF2
.text:00015AE6 ; ---------------------------------------------------------------------------
.text:00015AE6
.text:00015AE6 loc_15AE6: ; CODE XREF: .text:00015ADBj
.text:00015AE6 cmp eax, 2
.text:00015AE9 jnb short loc_15AF2
.text:00015AEB mov dword ptr [esi+4], 2
.text:00015AF2
.text:00015AF2 loc_15AF2: ; CODE XREF: .text:00015AE4j
.text:00015AF2 ; .text:00015AE9j
.text:00015AF2 mov eax, [esi+4]
--- End code ---
It's interesting because it shows that the minimum value allowed is 2. BTW this is from Catalyst 9.3
--- Quote ---Now I've been searching my head off for this util, but I could only find a v1.23 version , which (unfortunately) is already updated for the new toolchain. Do you have any other tips regarding this, or could the v1.22 version be put up shortly somewhere? Otherwise I guess I'll have to wait until the new patch comes out. (Which isn't that big a problem, but I thought I'd just ask.)
--- End quote ---
Better wait for the new patch, that will work with the new toolchain, hopefully I have some time to put everything together soon.
Dr.Venom:
--- Quote from: Calamity on November 12, 2012, 02:56:14 pm ---Well, it's just a rough calculation, based in an average duration of VBLANK around 1.33 ms for a 15 kHz CRT, you get that in order to fit one complete emulation cycle in the VBLANK period you need the emulator to run at least at:
16.67 / 1.33 = 12.50 times faster than the original machine did (x 100 = 1250% as MAME expresses it).
So unless a certain game runs at 1250% unthrottled on your hardware it won't be possible to get emulated inside VBLANK.
--- End quote ---
Ah OK, that makes sense. 12.5 times the original machine's speed is quite beefy. And it wouldn't surprise me if there's an average time and a standard deviation to the time mame takes to emulate individual frames. In other words there will probably be frames that need less than the 12.5 and others would need more than the 12.5 times faster (beefing up the required specs even more to keep it running full frame rate).
A sort of easy "patch" in the meantime, until PC's get more powerful, would be to "burst" emulate each frame starting at the middle of the real frame, instead of the beginning of the frame (as mame does it now I understand). That would then lower the required speed to 200% of the original machine's its speed, something that most modern PC's should be able to handle for many of the emulated systems. But then again, until PC's get much more powerful it's probably safer to start at the beginning of the frame as MAME does it now, so that the chance of a frame being skipped by missing vblank is the lowest for a wide range of PC's.
--- Quote ---
--- Quote from: Dr.Venom on November 11, 2012, 06:32:13 pm ---Biggest question would be if it is currently possible to PAUSE/START the MAME emulation core at will -multiple times- during a frame with only a -modest- change to the code?
--- End quote ---
That's the point. I'm afraid I have no clue of how the actual architecture of MAME works on the emulator side, or even if there's a general way of doing things common to all systems. However I would be deeply surprised if this a PAUSE/START thing could be implemented through modest changes.
--- End quote ---
OK. Well maybe someday we get some more information on the actual architecture on the emulator side. It would be interesting to know whether or not it could be implemented with moderate changes. Unfortunately I'm only starting out at programming, learning the basics, so for now (and probably the foreseeable future given the learning curve) I can't be much of a help in this regard.
--- Quote ---
--- Quote ---
--- Code: ---real line nr. -> 240---------------------------------------------262/0--------------------------------------------------80-------------------------------------------------160-------------------------------------------------240(wrap)
real display (>front buf)-> |-----------------REAL VBLANK---------------------| display of chunk *1* (lines 0-80)---------------->| display of chunk *2* (lines 81-160)--------------->| display of chunk *3* (lines 161-240)------------->||
emu core (>back buf) -> | emu chunk *1* (lines 0-80) -> pause+blit+wait-->| emu chunk *2* (lines 81-160) -> pause+blit+wait-->| emu chunk *3* (lines 160-240)-> pause+blit+wait--->| ---------> wrap when next frame------------------>||
D3DRASTER_STATUS -> | poll/wait for real display line 0 ------------->| poll/wait for real display line 80--------------->| poll/wait for real display line 160--------------->| poll/wait for real display line 240 (vblank)----->||
input polling -> |-------------------input enabled---------------->| --------------------input enabled---------------->| --------------------input enabled----------------->|--------------------input enabled----------------->||
--- End code ---
--- End quote ---
This would definitely be awesome if it could be achieved. It reminds me of the method implemented by Ootake's author, however I think he didn't link the emulation of the different chunks to the actual scanlines. In any case it's a good thing if it serves to raise interest and awareness about this stuff among emulator writers.
--- End quote ---
Yes definately.
--- Quote ---
--- Quote ---In theory it should work indeed, but unfortunately I'm not finding any "hard" evidence on the topic. I wish there would be some official Microsoft spec sheets on how these things are actually implemented. So that we would know what the default rates are in the different versions of windows, and whether/how they apply to different HID's, like mouse, keyboard and joypad/sticks. Unfortunately this seems rather hard to come by.
--- End quote ---
Of course for the above scheme to be worth the pain it should be paired with an almost real-time report of input events. Don't expect many spec sheets, it's like with custom video modes, this stuff is just beyond what's considered ortodox PC usage. What amazes me is that there's not much official concern that I know of being gamers one of the main targets of PC industry.
--- End quote ---
Indeed, getting to an almost realtime video display/emulation, would be much less effective if the input polling side would not keep up. There seems to be some concern/demand from the FPS community on the usb polling rate though. With some manufactures bringing out dedicated mouses using dedicated drivers in which the polling rate can be set, like this one from Corsair (http://www.corsair.com/vengeance-m60-performance-fps-laser-gaming-mouse.html ). It claims selectable response times of 1000Hz, 500Hz, 250Hz, or 125Hz (1ms, 2ms, 4ms or 8ms), but very much unfortunately so I haven't seen these kind of gamer dedicated hardware for joypads and joysticks.
--- Quote ---
--- Quote ---Ah yes, I forgot to mention that the Radeon Pro utility -only- works with 32-bit applications. It uses some kind of "hook" system, that simply doesn't work for 64 bit applications. It's one of the reasons I'm compiling most of the emulator stuff as 32-bit applications.
Whether or not RP is really applying your settings is shown by the taskbar status icon, like in the image below.
--- End quote ---
I did some research on this "Flip Queue Size" thing. Well, it seems it's controlled by a registry key named FlipQueueSize, so it should work without the need of any utility.
--- End quote ---
Great that you've been digging deeper into this. Could you post the registry key path in which you find this specific key? (A search in my Win7 64-bit registry didn't reveal the key.)
--- Quote ---This key is read by the ati3duag.dll file. I dug in the disassembly and found this:
--- Code: ---.text:00015ACB push offset aFlipqueuesize ; "FlipQueueSize"
.text:00015AD0 call sub_39AA0
.text:00015AD5 mov eax, [esi+4]
.text:00015AD8 cmp eax, 0Ah
.text:00015ADB jbe short loc_15AE6
.text:00015ADD mov dword ptr [esi+4], 0Ah
.text:00015AE4 jmp short loc_15AF2
.text:00015AE6 ; ---------------------------------------------------------------------------
.text:00015AE6
.text:00015AE6 loc_15AE6: ; CODE XREF: .text:00015ADBj
.text:00015AE6 cmp eax, 2
.text:00015AE9 jnb short loc_15AF2
.text:00015AEB mov dword ptr [esi+4], 2
.text:00015AF2
.text:00015AF2 loc_15AF2: ; CODE XREF: .text:00015AE4j
.text:00015AF2 ; .text:00015AE9j
.text:00015AF2 mov eax, [esi+4]
--- End code ---
It's interesting because it shows that the minimum value allowed is 2. BTW this is from Catalyst 9.3
--- End quote ---
That's some very cool digging :) and definitely interesting. Could it be that the RadeonPro tool patches this value on runtime? I remember someone "proving" somewhere that the flipqueuesize got changed adequately by the RadeonPro tool, but I can't remember when/where I read this. I would also be very much interested if you get any additional findings on this matter.
--- Quote ---
--- Quote ---Now I've been searching my head off for this util, but I could only find a v1.23 version , which (unfortunately) is already updated for the new toolchain. Do you have any other tips regarding this, or could the v1.22 version be put up shortly somewhere? Otherwise I guess I'll have to wait until the new patch comes out. (Which isn't that big a problem, but I thought I'd just ask.)
--- End quote ---
Better wait for the new patch, that will work with the new toolchain, hopefully I have some time to put everything together soon.
--- End quote ---
Great, I will.
In the meantime I was wondering if there's a possibility that I could "lift out" only the 'changeres' functionality from your patch, and apply that to the official target. Purely for my personal use, to test how it would work with the rendering of the main build (which works for me with ddraw and syncrefresh). My first attempt worked in the sense that it did change the visible resolutions from within the game on the fly, but it doesn't call the accompanying realtime screenswitch. Could you possibly give me a pointer on what to look for regarding this?
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version