Software Support > GroovyMAME

The input lag issue in the context of emulation [about new -frame_delay option]

<< < (2/7) > >>

Calamity:

--- Quote from: Dr.Venom on November 07, 2012, 06:47:25 pm ---Yes I also would rather use -modelines, but unfortunately I'm on Windows 7 64-bit and haven't been able to get that option running. Since I gathered that it also isn't supposed to work without the CRT_Emudriver, I've focused myself on getting GroovyMAME to run with the Soft15Khz modelines. Which is great btw, apart from the mentioned issue with the syncrefresh.
--- End quote ---

You should have started with this ;)


--- Quote ---I'd perform the following tests before going on:

- Try switching to video d3d just in case

This works (correct speed), but unfortunately it has a short sound glitch (pitch shift) upon screen switches, plus very small graphic glitches.  (These don't happen in ddraw).
--- End quote ---

The sound glitch upon mode change is something that you have to live with. I'm interested on the other hand on those small graphic glitches, what are they exactly?

That said, unfortunately DirectDraw seems to be extremely buggy in Windows 7. Even in Vista I've seen very odd things when testing GM + ddraw on people's laptops, frames chopped by the middle and stuff like that.

Do you happen to have your desktop mode set as interlaced? There's a well known bug affecting W7 and ddraw when switching from progressive to interlaced and viceversa.

I'm don't know the underlying reasons but it seems that DirectDraw is emulated to some degree under W7/Vista, so the only interface you can trust to work as advertised is Direct3D.

I'm even considering switching back to d3d as the default video setup in GM for the new version for compatibility reasons, once the classic drawbacks of using d3d have been solved by means of patches (-cleanstretch, etc.).


--- Quote ---I've run Arcade_OSD and done the speed measurements per screenmode (5/coin). The results are:

320x224 -> 60.002 hz
320x240 -> 60.001 hz
640x224 -> 60.005 hz
640x240 -> 60.006 hz

Also with every screen when doing the speed test, the rastered background is scrolling very smoothly (nice feature btw :) ). I guess this confirms that the screenmodes are ok?

--- End quote ---

Yes, this confirms the screen modes are ok. I'm thinking of a possible test if you have time and energy. Arcade_OSD uses ddraw's flip function in order to synchronize, however GM uses ddraw's waitvsync, which is a different although related DX's feature. You can force GM to use ddraw's flip function by enabling -triplebuffer, so something you could test would be this:

groovymame game -video ddraw -triplebuffer -nothrottle -nomt

I don't believe it would make any difference, but just for the sake of science...

Dr.Venom:

--- Quote from: Calamity on November 08, 2012, 11:47:07 am ---You should have started with this ;)
--- End quote ---

No, no, you should have started asking that ;)


--- Quote ---The sound glitch upon mode change is something that you have to live with. I'm interested on the other hand on those small graphic glitches, what are they exactly?
--- End quote ---

What it does upon switch is that it shortly shows the desktop wallpaper (this is normal) then it switches back into the game visuals (which is normal), but then while it is already showing the game visuals, it again very shortly shows an instant (a frame or two) of the desktop wallpaper again. That last part I was/am perceiving as a "glitch". (Mainly because ddraw switches cleanly from game visuals-> desktop shortly -> game visuals).


--- Quote ---That said, unfortunately DirectDraw seems to be extremely buggy in Windows 7. Even in Vista I've seen very odd things when testing GM + ddraw on people's laptops, frames chopped by the middle and stuff like that.

Do you happen to have your desktop mode set as interlaced? There's a well known bug affecting W7 and ddraw when switching from progressive to interlaced and viceversa.
--- End quote ---

I don't have my desktop set to interlaced mode, but I'm aware of this bug. I encountered it when running the psx emulation. At one point it then just shows a black screen.  I've been involved with WinUAE testing some time ago, and from my experience with that, the problem mainly shows when the desktop is set to interlace and the program tries to open a progressive mode. The other way around shouldn't be a problem. And once the program is running it also doesn't have an issue switching between the two.


--- Quote ---I'm even considering switching back to d3d as the default video setup in GM for the new version for compatibility reasons, once the classic drawbacks of using d3d have been solved by means of patches (-cleanstretch, etc.).
--- End quote ---

Yeah that automatic resizing is definately one of the drawbacks when using d3d in pixel perfect emulation. GM's -cleanstretch does seem to go a long way already into getting 1:1 pixel mapping though, which is a good thing..


--- Quote ---Yes, this confirms the screen modes are ok. I'm thinking of a possible test if you have time and energy. Arcade_OSD uses ddraw's flip function in order to synchronize, however GM uses ddraw's waitvsync, which is a different although related DX's feature. You can force GM to use ddraw's flip function by enabling -triplebuffer, so something you could test would be this:

groovymame game -video ddraw -triplebuffer -nothrottle -nomt

I don't believe it would make any difference, but just for the sake of science...
--- End quote ---

Forcing the above makes it run perfectly at the correct speed! (And the switching is clean without artifacts.)

But... IMO, there's a large drawback in the use of - triplebuffer as it introduces quite a a large portion of "input lag". So while it looks good, the playability of fast shoot 'm ups goes down the drain. IMHO sadly overlooked by many people, but it becomes painfully obvious when comparing it side by side with real hardware.

Is there any chance of getting the flip function to work correctly without -triplebuffer in GM? That would be perfect :) Possibly you're already familiar with it, but you can ask a Direct3D Device if it currently is in VBLank via the D3DRASTER_STATUS

http://msdn.microsoft.com/en-us/library/windows/desktop/bb172596%28v=vs.85%29.aspx . Maybe that could create a possibility to using the flip method with (no-buffer) vblank timing?

Calamity:

--- Quote from: Dr.Venom on November 08, 2012, 02:16:34 pm ---Forcing the above makes it run perfectly at the correct speed! (And the switching is clean without artifacts.)

--- End quote ---

Good to know, fairly interesting.


--- Quote ---But... IMO, there's a large drawback in the use of - triplebuffer as it introduces quite a a large portion of "input lag". So while it looks good, the playability of fast shoot 'm ups goes down the drain. IMHO sadly overlooked by many people, but it becomes painfully obvious when comparing it side by side with real hardware.
--- End quote ---

Yeah, and that's a paradox because the whole idea behind triplebuffering is to reduce input lag associated to double buffering to the minimum while preventing tearing.

It's not the concept of triplebuffering what's wrong IMHO but the implementation of DirectX flip function what seems to be the problem, because it doesn't return inmediately as advertised, thus preventing a truly asynchronous (lagless) rendering.

We have implemented an asynchronous triple buffer in GM that works when -multithreading is enabled, by moving the rendering code into a third execution thread, thus bypassing the flip wait bottleneck, and in theory should be lagless. Don't expect smooth scrolling, obviously.

However we can't be sure, even in this case, that the behaviour of DirectX is the correct one when dealing with more than two buffers. We would expect DirectX to always flip to the most recent rendered frame but I suspect it could be just arranging a damned chain, that would explain some of the extra lag noticed by people.

Additionally, there's another source of lag in main line MAME that gets exposed when using -triplebuffer, especially with -multithreading, when the video card's refresh and the game refresh are different enough, that's truly dramatic. This happens because the input is received through the window thread but this one is locked during a flip operation (for the reason explained above). As the main emulation thread runs freely this often results in several consecutive frames being virtually deaf to the input messages.

Many 60 Hz vertical games are often forced to run rotated on horizontal monitors at frequencies of 50-53Hz or so in order to allow 256-288 lines in 15 kHz, this is the perfect test case, and I wonder if most horror tales about -triplebuffer don't come from this fact. (This is also fixed by GM.)

If it wasn't clear: of course *normally* you don't need triplebuffer, -syncrefresh (vsync) is enough. We only need triplebuffer when video card and game speed are too different and we can't synchronize without affecting speed but we still want to get rid of tearing.

Anyway there's a lot of confusion because most bad press about vsync/triplebuffer comes from articles written for the pc 3d game scenario where they want their game loops to run at as many fps as possible regardless the video card refresh. Our case is totally different because in emulation we want the loop and the screen to update at the same pace.


--- Quote ---Is there any chance of getting the flip function to work correctly without -triplebuffer in GM? That would be perfect :) Possibly you're already familiar with it, but you can ask a Direct3D Device if it currently is in VBLank via the D3DRASTER_STATUS

http://msdn.microsoft.com/en-us/library/windows/desktop/bb172596%28v=vs.85%29.aspx . Maybe that could create a possibility to using the flip method with (no-buffer) vblank timing?

--- End quote ---

Well that is a very easy patch to implement if you have the time to compile and test. This change in ddraw.c will revert -triplebuffer behaviour to classic double buffer:


--- Code: --- // for triple-buffered full screen mode, allocate flipping surfaces
if (window->fullscreen && video_config.triplebuf)
{
dd->primarydesc.dwFlags |= DDSD_BACKBUFFERCOUNT;
dd->primarydesc.ddsCaps.dwCaps |= DDSCAPS_FLIP | DDSCAPS_COMPLEX;
//dd->primarydesc.dwBackBufferCount = 2;
dd->primarydesc.dwBackBufferCount = 1;
}

--- End code ---

Actually what I had in mind for the future(?) would be to get rid of flipping altogether, in order to manually implement a -triplebuffering model that actually worked as the theory says, by bypassing the whole DirectX black box but for the waitvsync function, but unfortunately today I learned that can't be trusted exclusively :)

Another option is to create a manual loop to poll the vsync status as you say, that's a good possibility now that we already have a separate thread for that.

Dr.Venom:

--- Quote from: Calamity on November 08, 2012, 03:25:05 pm ---Yeah, and that's a paradox because the whole idea behind triplebuffering is to reduce input lag associated to double buffering to the minimum while preventing tearing.
--- End quote ---

True, and when it works as expected it would probably still lower the latency.

But that said, personally I'm not a big fan of using either double or triplebuffering technologies when it comes to emulation. To me all buffering technologies have been invented to facilitate "modern day" computing. E.g.
- Watching a movie on a computer where background programs can interrupt the flow --> buffering comes to the rescue
- Playing a game with gfx settings that are too demanding for the hardware --> buffering comes to the rescue
- etc...

Don't get me wrong, these are all very much valid and useful applications of buffering. The problem is that the buffering model and the way 80's and 90's arcade and home consoles work are too far apart. For the sake of science, as a generalization the old hardware is simply a "no buffer" design, video ram is prepped during vertical blank and then displayed, sound runs directly as it is generated each cycle and the state of input (joystick/keyboard) is available to the system in realtime. (At least that is how I understand it works, correct me if I'm wrong..)

To come anywhere near this design with emulation you have to have video, sound and input polling running within one frame. This can be done *only* when using a single buffer design that flips within the same frame, has a soundbuffer of less than one frame, and polls input as often as possible. Looking at the emulation of a single frame (suppose 50hz refresh), in 20 milliseconds of real world time, it would need to do:


--- Code: ---0ms ---------------------------------------------------17ms-- vertical blank -- 20ms
|| emulate frame (render to buffer) --> wait for sync ----> flip in vblank ----- ||
||<--------------------------- poll input continuously --------------------------->

--- End code ---

I guess we can call this the "holy grail" for emulation. To me this should be possible to achieve given enough computing power on the users end, and a good software implementation of the emulation.

What's also becoming clear from this model is that either form of double or triple buffering will simply break the "holy grail", as it will always cause - at least - the problem of one frame of additional "input lag" (actually the video is delayed, but it is perceived as input lag)

Given this, it seems there's probably room for two (configurable) emulation/display update methods:

1) The "holy grail" (users hardware is powerful enough to render full frame rate and has a matching display refresh)
2) triple buffering  (users hardware is not powerful enough to render full frame rate and/or doesn't have a matching display refresh)

Given what you said earlier (quote below), I guess we're on the same page regarding this already :)


--- Quote ---If it wasn't clear: of course *normally* you don't need triplebuffer, -syncrefresh (vsync) is enough. We only need triplebuffer when video card and game speed are too different and we can't synchronize without affecting speed but we still want to get rid of tearing.
--- End quote ---


--- Quote ---Additionally, there's another source of lag in main line MAME that gets exposed when using -triplebuffer, especially with -multithreading, when the video card's refresh and the game refresh are different enough, that's truly dramatic. This happens because the input is received through the window thread but this one is locked during a flip operation (for the reason explained above). As the main emulation thread runs freely this often results in several consecutive frames being virtually deaf to the input messages.

Many 60 Hz vertical games are often forced to run rotated on horizontal monitors at frequencies of 50-53Hz or so in order to allow 256-288 lines in 15 kHz, this is the perfect test case, and I wonder if most horror tales about -triplebuffer don't come from this fact. (This is also fixed by GM.)

--- End quote ---

Thanks for explaining. Years ago, before I got into the whole Soft15Khz/CRT/modeline tweaking I had a LCD monitor at fixed refresh and had the described dramatic experience too many times with MAME (whatever config I tried), which made me abandon it all together for many years. Luckily I got back into it now with GroovyMAME :)

I'm not sure how it works, but your comment might also explain a quote from the official MAME documentation re triplebuffer, that I still don't understand fully. It's found in the newvideo.txt in the docs folder (http://mamedev.org/source/docs/newvideo.txt.html) under the description for the "Category 1" user:


--- Quote ---To avoid tearing artifacts, I recommend using the -triplebuffer option as well. Just make sure your monitor's refresh rate is higher than the game you are running.
--- End quote ---

The only thing I can think of is that running at a lower monitor refresh will make MAME render and drop frames (to adjust to the lower speed), which is more of a bad thing then just skipping ahead (having the "benefit" of not rendering the frame)?

--- Quote ---Well that is a very easy patch to implement if you have the time to compile and test. This change in ddraw.c will revert -triplebuffer behaviour to classic double buffer:
--- End quote ---

I tried compiling it, but unfortunately  I get an error at the end, which seems to have to do with the fact that my MinGW installation is already updated for the new compile chain (I've been compiling the mainline  0147 versions succesfully).


--- Quote ---Actually what I had in mind for the future(?) would be to get rid of flipping altogether, in order to manually implement a -triplebuffering model that actually worked as the theory says, by bypassing the whole DirectX black box but for the waitvsync function, but unfortunately today I learned that can't be trusted exclusively :)
--- End quote ---

:)


--- Quote ---Another option is to create a manual loop to poll the vsync status as you say, that's a good possibility now that we already have a separate thread for that.
--- End quote ---

It would at least be worth exploring I guess. One of the advantages is that you keep full control of "the box" and at any time you know what's going on (the function returns the rough scanline number when it's not in vblank). So you could do nice things like still flipping a frame if it only missed vblank by a small fraction.  You would also be able to quickly gauge the real refresh of the video card, which opens a door to sync sound in line with the refresh rate.

I would expect it to take some time and testing to get implemented correctly though. If you'd choose to experiment with it and at any time you'd want me to do some testing just let me know.

Calamity:
Thank you for your elaborated answer, it's a pleasure to discuss with you.

We're basically talking about the same thing here. It's just a matter of terms and I think I can hopefully help clarifying. This scheme of yours:


--- Quote from: Dr.Venom on November 09, 2012, 08:44:49 am ---
--- Code: ---0ms ---------------------------------------------------17ms-- vertical blank -- 20ms
|| emulate frame (render to buffer) --> wait for sync ----> flip in vblank ----- ||
||<--------------------------- poll input continuously --------------------------->

--- End code ---

I guess we can call this the "holy grail" for emulation. To me this should be possible to achieve given enough computing power on the users end, and a good software implementation of the emulation.

--- End quote ---

Well, this is what we know as double buffering. This is actually what you'd get by compiling the suggested patch. You have two buffers:

- buffer #1: the visible VRAM being transferred to the screen*
- buffer #2: the back buffer where you render to.


--- Quote ---What's also becoming clear from this model is that either form of double or triple buffering will simply break the "holy grail", as it will always cause - at least - the problem of one frame of additional "input lag" (actually the video is delayed, but it is perceived as input lag)
--- End quote ---

Indeed, but it's not the fact of having 2 or more buffers what adds a frame of lag as one would think, it's the very concept of "frame-based" emulation what causes this. The reason behind this is that transferring the contents of the VRAM to the screen (*) is actually a process that consumes time too (17 ms), as the raster travels through the screen, so once you "flip in vblank" you need to wait some time to see the whole frame displayed, but in the meanwhile there's in new frame being cooked that won't contain your reactions to what's happening on the screen.

By using the option -syncrefresh in MAME you get a slightly different implementation of double buffering, so instead of "flipping" (which consists of a low level change of the visible VRAM offset without involving memory transfers), what we do is a plain copy of our back buffer into the visible VRAM ("blitting"), we're just careful of doing it during VBLANK. Obviously this approach consumes more resources but I tend to prefer it to the flipping black box.

But even if we used a single buffer, which is certainly possible for a fast nowadays' computer, so we would directly render everything into the visible VRAM during the VBLANK time without previous buffering, we would be running in the same 1-frame-of-lag issue, as long as our emulator design is frame-based.

On a different plane of things, we have to consider how the input is polled. In an event driven OS like Windows we don't poll input continuously. The system will send us a message when some new input happens, these messages will get buffered and we usually read them once per frame. Now, this model should be good enough, leaving apart the built-in system input lag that in theory should be possible to get reduced to a minimum as hardware improves.

But due to the design of MAME, when vsync is enabled we can get some extra lag as the input remains locked during the wait for vsync, which is represented in the following scheme, as compared the GM case where this problem is solved:


--- Code: ---Vanilla MAME + vsync:

0ms --------------------------------------------------------15.4ms --- vertical blank -- 16.7ms
||...emulate frame (render to buffer) --> wait for sync ----> blit --> emulate next---...... ||
||<---------- input enabled ----------> <----- input locked ---------> <--- input enabled ---->

GroovyMAME + vsync + multithreading:

0ms --------------------------------------------------------15.4ms --- vertical blank -- 16.7ms
||...emulate frame (render to buffer) --> wait for sync ----> blit --> emulate next---...... ||
||<---------------------------------- input enabled ------------------------------------------>

--- End code ---

Notice that the scale is not correct and in a normal situation the wait for vsync will take most of the frame time, specially on a fast computer.

So now it's when emulator writers tell you that these are the limits of emulation. But I do believe that the "holly grail" of emulation is actually feasible in practice, understanding it as a piece of software that works as an *exact* substitution of the emulated hardware, in terms of response. It's only that, IMHO, the frame based concept would need to be replaced by a scanline based model, where only the next scanline is buffered and we use hsync instead of vsync for synchronizing.

Considering that emulator writers use flat panels, such an emulator is not likely going to see the light :)


--- Quote ---Thanks for explaining. Years ago, before I got into the whole Soft15Khz/CRT/modeline tweaking I had a LCD monitor at fixed refresh and had the described dramatic experience too many times with MAME (whatever config I tried), which made me abandon it all together for many years. Luckily I got back into it now with GroovyMAME :)
--- End quote ---

It's good to hear that.


--- Quote ---I'm not sure how it works, but your comment might also explain a quote from the official MAME documentation re triplebuffer, that I still don't understand fully. It's found in the newvideo.txt in the docs folder (http://mamedev.org/source/docs/newvideo.txt.html) under the description for the "Category 1" user:


--- Quote ---To avoid tearing artifacts, I recommend using the -triplebuffer option as well. Just make sure your monitor's refresh rate is higher than the game you are running.
--- End quote ---

The only thing I can think of is that running at a lower monitor refresh will make MAME render and drop frames (to adjust to the lower speed), which is more of a bad thing then just skipping ahead (having the "benefit" of not rendering the frame)?
--- End quote ---

:)

Yeah, that's a good point.

The word "triple" in -triplebuffer is misleading as it suggests an additional degree of buffering when that's not the concept. It took me some time to visualize this. But we must see triple buffering just as an asynchronous version of double buffering.

The double buffering model anchors the game loop to the refresh rate of the video card. I believe that PC game developers wanted to free themselves from the tyranny of refresh rates so they invented triple buffering. We can visualize it as two separate loops running in parallel, the game loop and the flip loop. So the game loop can run at any absurd speed sending new frames to the flip loop which will obviously need to drop some of them depending of the video card's refresh but in theory will always draw the most recent once at the time the VBLANK happens.

Now as MAME is designed to use the CPU clock for accurately timing of emulation it needs to be decoupled from the screen refresh but this leads to horrible tearing, so someone thought it would be a good idea to use the triple buffering model, and actually it is, if it wasn't for the fact that the DX's flip functions don't worked as advertised, i.e. creating a second back buffer doesn't result in asynchronous flipping (notice I mean asynchronous to the game loop, the flip is always synced to the vertical retrace).

This results in MAME's -triplebuffer option anchoring the game loop to the video card's refresh when this is lower than the desired speed, so the benefits of triple buffering don't apply here and we have only a sophisticated version of double buffering.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version