Thank you for your elaborated answer, it's a pleasure to discuss with you.
We're basically talking about the same thing here. It's just a matter of terms and I think I can hopefully help clarifying. This scheme of yours:
0ms ---------------------------------------------------17ms-- vertical blank -- 20ms
|| emulate frame (render to buffer) --> wait for sync ----> flip in vblank ----- ||
||<--------------------------- poll input continuously --------------------------->
I guess we can call this the "holy grail" for emulation. To me this should be possible to achieve given enough computing power on the users end, and a good software implementation of the emulation.
Well, this is what we know as
double buffering. This is actually what you'd get by compiling the suggested patch. You have two buffers:
- buffer #1: the visible VRAM being transferred to the screen*
- buffer #2: the back buffer where you render to.
What's also becoming clear from this model is that either form of double or triple buffering will simply break the "holy grail", as it will always cause - at least - the problem of one frame of additional "input lag" (actually the video is delayed, but it is perceived as input lag)
Indeed, but it's not the fact of having 2 or more buffers what adds a frame of lag as one would think, it's the very concept of "frame-based" emulation what causes this. The reason behind this is that transferring the contents of the VRAM to the screen (*) is actually a
process that consumes time too (17 ms), as the raster travels through the screen, so once you "flip in vblank" you need to wait some time to see the whole frame displayed, but in the meanwhile there's in new frame being cooked that won't contain your reactions to what's happening on the screen.
By using the option -syncrefresh in MAME you get a slightly different implementation of double buffering, so instead of "flipping" (which consists of a low level change of the visible VRAM offset without involving memory transfers), what we do is a plain copy of our back buffer into the visible VRAM ("blitting"), we're just careful of doing it during VBLANK. Obviously this approach consumes more resources but I tend to prefer it to the flipping black box.
But even if we used a single buffer, which is certainly possible for a fast nowadays' computer, so we would directly render everything into the visible VRAM during the VBLANK time without previous buffering, we would be running in the same 1-frame-of-lag issue, as long as our emulator design is frame-based.
On a different plane of things, we have to consider how the input is polled. In an event driven OS like Windows we don't poll input continuously. The system will send us a message when some new input happens, these messages will get buffered and we usually read them once per frame. Now, this model should be good enough, leaving apart the built-in system input lag that in theory should be possible to get reduced to a minimum as hardware improves.
But due to the design of MAME, when vsync is enabled we can get some extra lag as the input remains locked during the wait for vsync, which is represented in the following scheme, as compared the GM case where this problem is solved:
Vanilla MAME + vsync:
0ms --------------------------------------------------------15.4ms --- vertical blank -- 16.7ms
||...emulate frame (render to buffer) --> wait for sync ----> blit --> emulate next---...... ||
||<---------- input enabled ----------> <----- input locked ---------> <--- input enabled ---->
GroovyMAME + vsync + multithreading:
0ms --------------------------------------------------------15.4ms --- vertical blank -- 16.7ms
||...emulate frame (render to buffer) --> wait for sync ----> blit --> emulate next---...... ||
||<---------------------------------- input enabled ------------------------------------------>
Notice that the scale is not correct and in a normal situation the wait for vsync will take most of the frame time, specially on a fast computer.
So now it's when emulator writers tell you that these are the limits of emulation. But I do believe that the "holly grail" of emulation is actually feasible in practice, understanding it as a piece of software that works as an *exact* substitution of the emulated hardware, in terms of response. It's only that, IMHO, the frame based concept would need to be replaced by a scanline based model, where only the next scanline is buffered and we use hsync instead of vsync for synchronizing.
Considering that emulator writers use flat panels, such an emulator is not likely going to see the light
Thanks for explaining. Years ago, before I got into the whole Soft15Khz/CRT/modeline tweaking I had a LCD monitor at fixed refresh and had the described dramatic experience too many times with MAME (whatever config I tried), which made me abandon it all together for many years. Luckily I got back into it now with GroovyMAME
It's good to hear that.
I'm not sure how it works, but your comment might also explain a quote from the official MAME documentation re triplebuffer, that I still don't understand fully. It's found in the newvideo.txt in the docs folder (http://mamedev.org/source/docs/newvideo.txt.html) under the description for the "Category 1" user:
To avoid tearing artifacts, I recommend using the -triplebuffer option as well. Just make sure your monitor's refresh rate is higher than the game you are running.
The only thing I can think of is that running at a lower monitor refresh will make MAME render and drop frames (to adjust to the lower speed), which is more of a bad thing then just skipping ahead (having the "benefit" of not rendering the frame)?
Yeah, that's a good point.
The word "triple" in -triplebuffer is misleading as it suggests an additional degree of buffering when that's not the concept. It took me some time to visualize this. But we must see triple buffering just as an asynchronous version of double buffering.
The double buffering model anchors the game loop to the refresh rate of the video card. I believe that PC game developers wanted to free themselves from the tyranny of refresh rates so they invented triple buffering. We can visualize it as two separate loops running in parallel, the game loop and the flip loop. So the game loop can run at any absurd speed sending new frames to the flip loop which will obviously need to drop some of them depending of the video card's refresh but in theory will always draw the most recent once at the time the VBLANK happens.
Now as MAME is designed to use the CPU clock for accurately timing of emulation it needs to be decoupled from the screen refresh but this leads to horrible tearing, so someone thought it would be a good idea to use the triple buffering model, and actually it is, if it wasn't for the fact that the DX's flip functions don't worked as advertised, i.e. creating a second back buffer doesn't result in asynchronous flipping (notice I mean asynchronous to the game loop, the flip is always synced to the vertical retrace).
This results in MAME's -triplebuffer option anchoring the game loop to the video card's refresh when this is lower than the desired speed, so the benefits of triple buffering don't apply here and we have only a sophisticated version of double buffering.