Author Topic: Greatly Improved Black Frame Insertion on 240Hz Monitors (& "Temporal HLSL") (Read 21850 times)

mdrejhon · « **Reply #40 on:** September 26, 2020, 10:09:16 pm »

Quote from: Calamity on September 25, 2020, 06:35:05 am

I need to re-add BFI, the feature got missed accidentaly in GM releases after June. I'll read your suggestion calmly. Anyway, multithreaded rendering has always been problematic. I haven't seen a single implementation that doesn't crash under certain stress conditions. We already have a "blitting" thread in GM for the triplebuffer implementation. The roadmap we have goes in the direction of implementing a cross-platform software vsync "interrupt" library, using threading to keep track of vsync while keeping rendering in the main thread, similar to what we discussed in your site. Not sure how BFI and your other suggestions will fit in this scheme.

It’s not multithreaded rendering, and actually similiar to your triple buffer workflow, I think
Actually, it’s not a multithreaded render technically that I am suggesting — see the comments at https://github.com/libretro/RetroArch/pull/11342

Basically, it’s like your blitter thread, where the main thread renders, and the other thread does Present(). Basically it’s a thread that only does timings/presenting. So the only thing it really does is conditionally blitting, waiting for a timer, busywaiting, and presenting to the screen. No rendering per-se, technically.

Basically, you want to extend your triple-buffering-only “blitting thread” to all sync technologies, not just tirple buffering. It would do its own software emulation of waitable swapchains for lowest latency (But may still use actual waitable swapchains at the output level, if it’s actually VSYNC ON).

Full frame workflows
It would provide the framepacing for triple buffering, G-SYNC, and VSYNC OFF, making sure that emulator frames stay framepaced at the emulator Hz.
1. Rendering thread blits frame to presenting thread
2. Presenting thread will busywait until correct time to present relative to last presentation
This will actually work universally, even for VSYNC ON. It will signal the rendering thread when it’s presented (if needed for emulator thread to continue, since we’re emulating VSYNC ON blocking behavior in software for all sync technologies). If it’s VSYNC ON at the output, frame presentation thread is just a passthrough

Beamraced workflows
Every raster, you would blit one “scanline slice” at a time from the rendering thread to the presenting thread. Could be one row of pixels (if 1:1 mapped) or multiple rows of pixels (if CRT filtered). Don’t worry about curved CRT filters and missing data, you’d just let the user adjust jittermargin (the beam race margin) accordingly to prevent glitching, this is just a per-emulator-pixel-row frameslice blit, and would not be the same frameslice size as the presenting thread.. The presenting thread would decide when it’s time to present a frameslice (i.e. blocks of how many scanlines)
1. Rendering thread would call a scan line blitter (perhaps a PresentScanLineFrameSlice() or whatever blitter wrapper name you do) to blit the scanline from the emulator to the presenter thread framebuffer. In the wrapper, busywaiting can occur there (to pace the calling emulator execution), a scanline-blocking behavior version of frame-blocking behavior (VSYNC ON). Maybe this is like a “HSYNC ON” — lol (horizontal sync)
2. Presenter thread will decide whether enough scanlines have been added to start a frameslice present. If so, it will suddenly present the frameslice to the display (with all appropriate busywaiting logic, which can be decoupled from the busywaiting in #1, which presents advantages — like scanline-level busywaiting in the blit wrapper, but frameslice-level busywaiting in present thread). The rest of the emulator wouldn’t need to know how big or small the configured frameslicing is done, or even frontbuffer execution (NVIDIA VRWorks API to make frontbuffer the display buffer = perfect for single-scanline beam racing). In fact, the frameslice size could be different between rendering thread and present thread, basically the amount of frameslice blitted between threads may not be the same size as the amount of frameslice actually presented to the screen — basically a jittery rolling queue of scanlines that can be chunked-inwards and chunked-outwards in independent chunksizes (i.e. single scanline blitted in, but frameslices blitted out), or can stay synchronous (same frameslice in, same frameslice out), or not (different size frameslice in, different size frameslice out, due to weird shapes of CRT filters), with the present thread maintaining the jittermargin as needed
3. There’d be a final Present() in the main emulator module which probably does do nothing except make sure the sync is still aligned (but might busywait if the destination display has scanned far ahead of the emulator).

BFI Workflow
1. The blitter thread would pre-generate the series of black framebuffers and pass them to the present thread
2. The present thread will accurately sequence the black frames with proper timing precision.

So you see, it’s exactly your blitter thread, except extended to cover all use cases (including beam racing). No rendering done in the present thread.
- It allows high precision VRR
- It allows high precision beam racing
- It allows high precision BFI
- It allows future rolling-scan software BFI (that can also be simultaneously beam-raced)
- It continues to allow high precision triple buffering

You’d do the appropriate thread safety practices to make sure that the framebuffers at the present thread level isn’t accessed simultaneously. So during a blit operation, you’d lock the present thread’s framebuffer being blitted to. And during a present operation, you’d lock the framebuffer too before presenting it and then unlock the framebuffer. That way, you got complete buffer thread-safety, and ZERO RENDERING in the frame presentation thread, while achieving the hit-many-birds-at-once goals.

And magically, this makes a lot of behaviours become combineable — such as inputdelays combined with BFI, or doing BFI on a non-blocking sync (BFI onto VRR) — and you’d be able to program new sync technologies not yet invented without needing to modify the rest of the emulator. Because the blit thread is just a VSYNC ON emulator regardless of what the output is doing.

Possible Architecture / Concept / Idea
What I suggest is that you have blitter wrappers, BlitScanLineFrameSlice() and a BlitFullFrame(). Let’s say you already implemented BlitFullFrame() for your triplebuffering implementation (i am not sure what your actual naming convention you used).

Blitting the scanline
BlitScanLineFrameSlice(fullbuffer, emupixelrow) would potentially blit (1/emu-vert-rez)th of (actual-vert-resolution) frameslice, corresponding to emulator pixel row, to the other thread (presenting thread). This call would block a time of (1/emu-vert-rez)th of (emulator Hz) since the last raster. This would only be called for beamraceable emulators, even if beamracing is not yet currently enabled. The scanline blitted doesn’t have to be the same scanline as what is emulator rendered, just approximately the territory, since we don’t need exact 1:1 since it’s near the end of the jittermargin territory, though could be a perfect 1:! If emulator framebuffer and output framebuffer is same resolution, with CRT filter disabled, then it’d be a one-pixel-row frameslice corresponding to the most recently rendered emulator pixel row, then it’d be pretty literal to its name. But what matters is that it’s Blitting frameslices in ultrafine granularity that are tinier than the actual output frameslicing on the output end.

Blitting the full framebuffer
BlitFullFrame() would potentially blit the full framebuffer to the other thread (presenting thread). This would be called every time emulator module finished rendering a thread.

You’d call both BlitScanLineFrameSlice() and BlitFullFrame() all of them regardless of current sync tech / beamrace setting.

Behavior at the render thread level
- BlitScanLineFrameSlice() would be a no-operation (Return immediately) if beam racing is disabled or undesired (e.g. RetroArch-style RunAhead workflows).
- BlitFullFrame() would be a no-operation (except potential timing-alignment busywaiting) if beamracing is enabled.

Behavior at the flame-flipper thread level (the present thread)
For today’s workflow you do for triple buffering already, the new added BitScanLineFrameSlice() is ignored (no delay, no data blitted inwards), while existing BlitFullFrame() has already done a full blit, much like you already do today with triple buffering. What happens in the thread is now extended to also include all sync technologies AND bfi AND beam racing, not just triple buffer.

Example situation of the presenter thread (frame flipper):

1. VSYNC ON + nonbeamraced: Behaves as passthrough VSYNC ON. BlitFullFrame becomes a synonym for a waitable swapchain Present(). Then do a busywait If Present() unblocked less than one emulator Hz since last Present(). (This occurs if output Hz is higher than emulator Hz, so it looks good for 60fps emulator at 120Hz). Return immediately if VSYNC ON blocking behavior was predictable.

2. VSYNC ON + BFI + nonbeamraced: Same algorithm as #1 — same as VSYNC ON + nonbeamraced including the busywait, with one exception: We busywait is at output Hz granularity instead of emulator Hz granularity, so the code is identical with only minor modifications. Cycle whole sequence of prerendered black frames this way.
BFI Antiflicker logic for emulator-running-faster-than-output-Hz-multiple: If emulator runs fast (Or the output refresh rate is blocking VSYNC ON at too low refresh rate) the number of framebuffers queued will grow — (basically too many framebuffers blitted from rendering thread to the frame flipper thread). If we build up enough BFI framebuffers for more than one emulator refresh cycle (example: If doing 180Hz and 3-frame-sequence BFI, then an overflow condition is 6 buffers queued in the frame-presenter thread) — then throw away unwanted BFI buffers and only cycle the newest emulator refresh cycle’s BFI sequence (e.g. keeping only 3). Result: we’ve dropped an emulator frame’s BFI-sequence-of-frames without creating interrupting flicker.

3. Unsynced (triple buffer / VRR / DWM / VSYNC OFF): Execute exactly the same algorithm as #1. It already has a conditional busywait, so automatically works correctly. Would work kind of like today’s GroovyMAME during triple buffering / VRR. So the existing algorithm is universal for all unsynced technologies. Basically works fine for triplebuffered / DWM / VRR / VSYNC OFF. Basically defacto, timingwise, it’d behave the same as your existing triple buffering threaded algorithm.

4. Unsynced + BFI + nonbeamraced: Execute exactly the same algorithm as #1 except with busywaits at the custom software defined Hz. For VRR you can do any software-defined Hz within VRR range, can be 120 or 180 or 240 for 240Hz VRR, since GroovyMAME I think already works with VRR simply by using its triple buffering algorithm, this is just a different workflow to achieve a BFI-compatible result.

Notice 1/2/3/4 is essentially the same algorithm — essentially the same as your existing triple buffering algorithm (slightly modified to be compatible with all sync technologies).

Now, that makes it much easier to add future beamracing workflows:

A. Beamraced: Thanks to BlitScanLineFrameSlice() from the rendering thread, that means the frame flipper’s thread’soutput framebuffer is already built up to almost current emulator scanline territory, you present the frameslice as you already do in your GroovyMAME patch, including the small raster-based busywaiting you already do. The frame flipper thread will do the raster busywaiting, while the rendering thread will spin on a raster mutex maintained by the frame flipper thread (respecting configurable jitter margin). Essentially hardware-based beamracing (emuraster=realraster)

B. Rolling-scan BFI: Electron gun emulator. The BlitScanlineFrameSlice() will have built up the framebuffer and then could render a “rolling bar” at the output framebuffer (in the rendering thread within the wrapper, before blitting) and then finally blit that to the presenter thread. At 360Hz, that is six rolling-bar positions (with alphablend overlaps), like six different 1/6th screenfuls (with bleed overlaps for alphablends to prevent that “stationary tearline artifact” problem). So most BlitScanlineFrameSlice() would return immediately with about 6 of the calls (evenly spaced apart in raster 1/6th screenfuls) suddenly rendering the framebuffer containing the rolling-bar, and passing it to the presenter thread on the spot. For 360Hz, it’d be full-refresh-cycle beamracing at the destination, 1/360sec behind the emulator refresh. No modification to existing emulator modules needed that are already calling the blitter wrappers, all of these modifications are all within the blit-wrapper and the presenter thread. Essentially software-based beamracing (output Hz granularity, don’t care about actual hardware raster position).

Hopefully this is a catch-all architecture that is a minor modification of your triple buffer workflow

There’s only one rendering thread. The other thread’s job is only to flip framebuffers. This would be a universal workflow that works with all sync technologies. CRT filtering will continue to stay in the rendering thread, as today.

I think it is your existing triple buffering workflow, minor modified to be also compatible with:
- VSYNC ON
- VSYNC OFF
- VRR (FreeSync, G-SYNC)
- BFI on VSYNC ON
- BFI on VRR
- Beamrace enhanced (hardware beamracing, software beam racing)
- Future rolling BFI / electron gun emulators
- Future sync technologies not yet invented

If I think what you already implemented (correct me if I am wrong) — then hopefully you can wrap your head around how conveniently futureproof your triple buffer algorithm is — with minor modifications to accomodate all sync technologies, plus add a scanline blitter hook (In addition to your existing frame blitter hook) to be compatible with all future workflows;

By default, your emulator would then work correctly with most sync technologies out of the box. Launching emulator into VSYNC ON or VRR or triple buffering, would work correctly without mandatory user configuring. So would DWM + VSYNC OFF or Enhanced SYnc or Fast Sync. Tjen by specifying a custom framerate cap (for VRR, that’s a software-based refresh rate) such as “180”, it would correctly do BFI on VRR. It could also be calculated from the BFI sequence size (e.g. 2,3,4) you specify, and it would assume that refresh rate, and framepace at that cap successfully regardless of VSYNC ON, VSYNC OFF, VRR, triple buffering. It’d only erratically flicker if it was fixed-Hz and not divisible by emulator Hz but it’d not flicker in all other situations (even triplebuffering would look fine, as long as it’s framepacing extremely accurately and the output Hz is an exact multiple of emulator Hz)

So basically a lot of automagic compatibility, simply by using your triplebuffer workflow for everything (even including VSYNC ON, even including BFI) by default.

Osirus23 · « **Reply #41 on:** September 28, 2020, 02:08:22 pm »

Quote from: mdrejhon on May 29, 2020, 09:07:21 pm

Someday, I hope retina-resolution direct-view MicroLED screens 1000Hz should make it very easy to "Temporal HLSL" emulate most CRT tubes, except for the actual curvedness.

That might even be re-creatable with the flexible display panels they are coming out with now.

donluca · « **Reply #42 on:** September 29, 2020, 04:21:39 pm »

We waited years just to have them making monitors curved the wrong way. /s

mdrejhon · « **Reply #43 on:** September 29, 2020, 05:49:00 pm »

Quote from: Calamity on September 25, 2020, 06:35:05 am

Hi Mark,

I need to re-add BFI, the feature got missed accidentaly in GM releases after June. I'll read your suggestion calmly. Anyway, multithreaded rendering has always been problematic. I haven't seen a single implementation that doesn't crash under certain stress conditions. We already have a "blitting" thread in GM for the triplebuffer implementation. The roadmap we have goes in the direction of implementing a cross-platform software vsync "interrupt" library, using threading to keep track of vsync while keeping rendering in the main thread, similar to what we discussed in your site. Not sure how BFI and your other suggestions will fit in this scheme.

UPDATE:

I have resummarized the frame-presenter thread idea in a much, much simpler way here.

Much more unified. Much easier to read. Far less confusing than my obviously-so-famous walls of text -- apologies -- but it's now been written into a compact easier read:

[Feature Request] Futureproof RetroArch with precision frame pacing presenter thread

The algorithm is generic and a best-practice for all emulators on non-60Hz display already, it's just simply extending the algorithm to be universal (even for VSYNC ON 60Hz double-buffered)

Theoretically, you can ignore my walls of text here and just read that thread instead.

Please confirm if this is what you are already intending to do with a vsync library approach? If so, is there a related github item? Your terminology & my terminology is different, but the principles are sound.

If I am thinking correctly:
- VRR is simply your existing planned vsync library approach of a software-based framepacing.
- Rendering thread just pre-renders the whole sequence of BFI frames, and passes them all at once to your vsync library. [a minor enhancement to vsync library]
- BFI and VRR+BFI is simply a minor modification of your existing planned vsync library approach accepting multiple framebuffers per emulator Hz, and software-framepacing at finer granularity.
- The magic is that the software-framepacing (you already use for triple buffering) only needs minor modification to work with all sync technologies (including VSYNC ON) so that everything is pipelined through a unified frame-presentation algorithm that allows you to use one algorithm for all sync technologies.

In other words... BFI is simply pre-rendering a sequence of frames relating to one emulator refresh, and passing them all to the presenting thread (your vsync library). Basically, your vsync library receives 3 frames for 1 emulator refresh, if doing 180Hz BFI. And so on. Each blitted frame would have its own metadata (emulator frame number, BFI sequence number, and in future, raster data) so that an entire BFI sequence can be dropped inside your vsync library, when needing to drop one emulator frame, or BFI sequence of most recent complete BFI sequence automatically repeated if incoming next emulator frame blitting to the vsync library is later than expected. That can maintain consistent autoflicker when emulator framerate is low/high, or paused, or Fast Forward, make it easier to combine features (BFI+VRR, BFI+inputdelay), etc. You'd still get the same identical looking stutters as non-BFI operation, but BFI stops being erratic. Your VSYNC library could potentially handle hardware-based heartbeats (e.g. VSYNC ON, max Hz, etc) and software-based heartbeats (e.g. VRR), including situations where the output Hz is a multiple of input Hz -- and correctly convert it to the required heartbeat that the renderer is expecting (like BFI divisors). And future improvements to your vsync library would accept frameslice blits between your render thread/vsync threads.

This might conceptually be only a very slight modification to your threaded vsync library approach. Simply implement your vsync library with the mindset that calls to the vsync library might blit multiple framebuffers per emulator Hz to your presenting thread -- whether it's full framebuffers for a BFI workflow to your vsync library (suddenly all at once for one emulator Hz) -- or frameslice-at-a-time for a beamraced workflow to your vsync library (gradually during an emulator scanout) -- essentially futureproofing your vsync library to all possible workflows. Being a Present()-to-Photons expert myself, I'd be happy to make sure your universal vsync library doesn't accidentally architecture itself into a corner.

It's very easily amenable to an iterative development approach, you might not need to do BFI right away, but you'd add provisions to do BFI in this "correct" recommended futureproof workflow for user-friendliness (BFI automatically working correctly with no configuring), elimination of most erratic flickers (improved precision), and easy automatically VRR-compatible BFI (simply by virtue of this architecture).

Ignore the other walls of text here if you haven't read them yet, check [Feature Request] Futureproof RetroArch with precision frame pacing presenter thread and I'd love to hear your comments.

Calamity · « **Reply #44 on:** September 30, 2020, 08:15:36 am »

Hi Mark,

I'm sorry I have very little time to write, chronically. As I said, Doozer and I are working on a software raster interrupt emulation library similar to what we discussed in your forum some time ago. Lacking a better name we call it "emusync". It's currently just a prototype/proof of concept and we haven't touched it for more than 1 year. It's not on github yet but if there's interest we can upload it. We don't know yet whether to integrate it in our other library (Switchres) or keep it as a separate (but related) project.

The intended approach is different to what you suggest in that we want to use multithreading to poll vsync rather than for rendering. This way, raster "interrupts" (vblank or hblank) can be exposed to the main thread as timing events you can easily synchronize your rendering or game logic to. Keeping a dedicated thread to tracking vblank ensures you can keep a perfect frame count and know when a vblank was missed. This will allow us to unify our frame delay, vsync offset and frame slice techniques under the same umbrella, and make it available to all rendering backends.

The threaded triplebuffer implementation we're using is something I'd like to get rid off since MAME is moving to BGFX eventually, which is already multithreaded. My reason to use threaded rendering in the first place was to allow asynchronous rendering on a video api that didn't allow it at the time (D3D9). The typical use case is when emulated refresh is different to monitor's refresh and you want to keep original game speed and still get no tearing. This requires a swap chain that allows to drop frames and that wasn't the case of D3D9.

I had lot of problems to make threaded triplebuffering somewhat stable. I currently use tricks like switching to single threading for a few frames when a context change is detected (alt+enter, video mode switch, etc.) and even that is not 100% safe. In my experience, video APIs hate multithreading. In the worst case they'll refuse to work at all (OpenGL). In the base case, you have a time bomb ready to blow upon something like a video mode change.

mdrejhon · « **Reply #45 on:** September 30, 2020, 03:21:19 pm »

Quote from: Calamity on September 30, 2020, 08:15:36 am

I'm sorry I have very little time to write, chronically.

Understood, real life comes first -- whether it is job, family, priority issues, etc.

Quote from: Calamity on September 30, 2020, 08:15:36 am

As I said, Doozer and I are working on a software raster interrupt emulation library similar to what we discussed in your forum some time ago. Lacking a better name we call it "emusync". It's currently just a prototype/proof of concept and we haven't touched it for more than 1 year. It's not on github yet but if there's interest we can upload it. We don't know yet whether to integrate it in our other library (Switchres) or keep it as a separate (but related) project.

It's a great approach. My thread suggest was also intended to double as also taking upon "emusync" tasks too (though it wasn't intended initially)

Quote from: Calamity on September 30, 2020, 08:15:36 am

The intended approach is different to what you suggest in that we want to use multithreading to poll vsync rather than for rendering.

Clarification: presenting, not rendering.

In my original idea, the emulator thread would do all rendering (and even pre-generating sqeuence of black frames). And the presenting thread would do the presenting timing & emusync responsibilities.

Now that I understand better, I now have a new idea suggestion that modifies your and my idea into a potentially simplified merger.

Two Ways To Beam Race

1. Hardware-based beam racing: (sync emu raster to real raster): This is the lagless vsync experiment that you successfully did, synchronizing to the real hardware's raster, also written in this Blur Busters article you already know.

2. Software-based beam racing: (don't care about real hardware raster): Upcoming 360Hz monitors lets you do 6 hardware refresh cycles per emulated 60Hz refresh cycle. So the first hardware refresh cycle gets framebuffer with top 1/6th updated, next hardware refresh cycle gets framebuffer with next 1/6th updated, and so on. Non-BFI and BFI modes of operations possible (blurred/alphablended frameslice edges to prevent tearing artifacts). Includes the implementations considered a CRT electron gun emulator (rolling-scan BFI, "temporal HLSL", BFIv3)

Note: As you remember, WinUAE supports hardware beamracing onto higher-Hz and VRR displays. It is achieved by surge-executing 1/60sec bursts of emulator to keep up with the very fast raster scanout on higher Hz displays. So for a 240Hz monitor, you've got 1/240sec surge-executes of emulator in sync with a 240Hz refresh cycle, followed by 3/240sec of idling to the next hardware refresh cycle that timing-aligns with the emulator refresh cycle.

So if emusync plans to later supports both (1) and (2), one can thought exercise the rest as special use cases (emu heartbeat & hardware heartbeat independent of each other):

Example "Emusync" Heartbeat Speeds

Classical VSYNC 60 Hz Operation: fastest speed emu hsync for 1/60sec; 1 hardware vsync per emu vsync
Hardware beamrace 60Hz emu onto hardware 60Hz: 1x speed emu hsync for 1/60sec; 1 hardware vsync per emu vsync
Hardware beamrace 60Hz emu onto 120Hz: 2x speed emu hsync for 1/60sec; 1 hardware vsync per emu vsync
Hardware beamrace 60Hz emu onto 240Hz: 4x speed emu hsync for 1/60sec; 1 hardware vsync per emu vsync
Software beamrace 60Hz emu onto 240Hz: 1x speed emu hsync for 1/60sec; 4 hardware vsync per emu vsync
Software beamrace 60Hz emu onto 360Hz: 1x speed emu hsync for 1/60sec; 6 hardware vsync per emu vsync
Full screen global BFI 60Hz emu onto 120Hz: fastest emu hsync for 1/60sec; 2 hardware vsync per emu vsync
Full screen global BFI 60Hz emu onto 180Hz: fastest emu hsync for 1/60sec; 4 hardware vsync per emu vsync
Full screen global BFI 60Hz emu onto 240Hz: fastest emu hsync for 1/60sec; 6 hardware vsync per emu vsync
(VRR) Classical 60 fps Operation on >60Hz+ VRR/triplebuffer/etc: fastest speed emu hsync for 1/60sec; 1 simulated "hardware" vsync per emu vsync
(VRR) Hardware beamrace 60Hz emu onto 120fps on >120Hz+ VRR: 2x speed emu hsync for 1/60sec; 1 simulated "hardware" vsync per emu vsync
(VRR) Hardware beamrace 60Hz emu onto 240fps on >240Hz+ VRR: 4x speed emu hsync for 1/60sec; 1 simulated "hardware" vsync per emu vsync
(VRR) Software beamrace 60Hz emu onto 180fps on >180Hz+ VRR: 1x speed emu hsync for 1/60sec; 3 simulated "hardware" vsync per emu vsync
(VRR) Software beamrace 60Hz emu onto 240fps on >240Hz+ VRR: 1x speed emu hsync for 1/60sec; 4 simulated "hardware" vsync per emu vsync
(VRR) Software beamrace 60Hz emu onto 300fps on >300Hz+ VRR: 1x speed emu hsync for 1/60sec; 5 simulated "hardware" vsync per emu vsync
(VRR) Full screen global BFI 60Hz emu onto 180fps on >180Hz+ VRR: fastest emu hsync for 1/60sec; 3 simulated "hardware" vsync per emu vsync
(VRR) Full screen global BFI 60Hz emu onto 240fps on >240Hz+ VRR: fastest emu hsync for 1/60sec; 4 simulated "hardware" vsync per emu vsync
Etc, etc.

Future Proof = Two "Independent" Heartbeat Signals for Emu Sync & Hardware Sync

The emusync library then simply becomes a universal sync-signalling library that accomodates all theoretically possible display workflows (including unforseen workflows). The renderer can thus decide what to do (e.g. surge execute a series of scanlines to render a frameslice as quickly as possible), but additionally makes it possible to later program a renderer to surge-execute then pre-render a BFI sequence, which is then "played out" based on the hardware heartbeats. Likewise for future software based CRT beam emulators in rolling-BFI algorithms.

Also -- Whatever emulator author can still decide to use a presenter thread (in addition to emusync library), or just implement their algorithms directly in the rendering thread. Our ideas actually end up not being mutually exclusive (emu sync library thread, and presenter thread), and purely optional to be same-thread or separate-thread:
- Emusync thread only becomes a present-timing-signaller (to rendering thread doing the presenting too)
- Emusync thread + present thread
So the same unmodified emusync can still achieve universal goals with and without a present thread. A case of breaking down the task into manageable chunks (sync responsibilities, present responsibilities).

When successfully decoupled, autoconfigure becomes massively easier, since there are much fewer incompatible combinations (VRR vs non-VRR). It does remove a few responsibilities from my approach (e.g. BFI anti-flicker algorithms); that responsibility is simply punted elsewhere, such as back to the renderer thread. But it keeps the emusync library unviersally compatible with all approaches.

Although, in theory, emusync and hardwaresync polls/signalling could be separate libraries -- it would be inefficient. Emusync will sometimes need access to hardware VSYNC and HSYNC signals (D3DKMTGetScanLine() approach), while other times needing to generate simulated hardware hsync (time offset between vblanks for simulated raster register) and generate simulated hardware vsync kind of like an ultra-precise timer event (framepacing for VRR, frame pacing for triplebuffer, or framepacing for BFI!). So might as well be the same module/library to hit two birds with one stone...

In theory, a generic cross-platform future-proof emusync library for all emulators would be lovely!

That said, KISS. Theoretically, I'd begin with compatibility with your common use cases including classical operation (60fps 60Hz VSYNC) & simple frameslice beam raced operation (non-VRR), being careful to make it possible that emusync can later achieve decoupled emu/hardware sync heartbeats (for futureproofing's sake).

I welcome comments by other emulator developers / team members.

mamenewb100 · « **Reply #46 on:** March 08, 2021, 11:59:16 pm »

Nice to see Calamity is still here working hard for the emulation community probably not getting enough kudos for what he does. I have an interest in this topic since I actually posted in here and used black frame insertion on GroovyMame on a CRT years ago. To scale a 480P 60Hz Arcade Monitor down to 240P 120Hz to get native resolutions and it worked phenomenal to this day. Not to long ago I upgraded to an IPS 1440P 165Hz display. Being the nerd I am, I had to see how well this LCD display would compare to my CRT. I'm actually pretty amazed that with HLSL it looks pretty darn good.

My display has slight ghosting running at 60Hz and a 5ms response time with no strobing options unfortunately. It does have an 'overdrive' mode that overclocks the pixels to get faster refresh rates but causes some artifacts to show up. Naturally I tried 120Hz Black Frame Insertion and it was working great with noticeably smoother scrolling. However I noticed the Image Retention issue that looks like Burn-In right away within minutes of playing a game. Unfortunately my monitor can't do 180Hz, so using 180Hz BFI is not really an option for most games. However it does just happen to run Midway games like Mortal Kombat and NBA Jam perfectly at 165Hz only because it uses that really low 54Hz refresh rate instead of a standard 60Hz. I can indeed confirm that there is ZERO Image Retention from Burn-In effects when using the 180Hz BFI option.

Just from a science perspective it has me curious if Mdrejhon's theory would be right about changing some frames around to prevent Image Retention from 120Hz BFI on LCD displays. Though I'm sure it's not as easy as just changing a few lines of code to have a modified 120Hz BFI option. And I realize there has to be enough interest to warrant the time investment to test such a theory. Just letting Calamity know that I'd happily be a beta tester to let you know if an anti-image retention 120Hz BFI worked on my display.

But anyway it's just a general curiosity for me and I know there are far more important things being worked on. Mad respect for keeping up with all this stuff for all these years still.

)

druroh · « **Reply #47 on:** March 09, 2021, 04:28:50 pm »

Quote from: mamenewb100 on March 08, 2021, 11:59:16 pm

Just from a science perspective it has me curious if Mdrejhon's theory would be right about changing some frames around to prevent Image Retention from 120Hz BFI on LCD displays. Though I'm sure it's not as easy as just changing a few lines of code to have a modified 120Hz BFI option. And I realize there has to be enough interest to warrant the time investment to test such a theory. Just letting Calamity know that I'd happily be a beta tester to let you know if an anti-image retention 120Hz BFI worked on my display. But anyway it's just a general curiosity for me and I know there are far more important things being worked on. Mad respect for keeping up with all this stuff for all these years still. )

I've a 144Hz display and had the same problem.

I tried to compile GroovyMame 0.226 with some source changes.

I searched for the frame update routine.
Now, every 6 seconds it doesn't swap between black frame and rendered frame.

It works for my display. There's no image retention!
Tested with groovymame.exe -norefresh -bfi 1 superman

It's only a quick test, but I hope it could be a good starting point.

changed files:
src/emu/video.h
src/emu/video.cpp

mamenewb100 · « **Reply #48 on:** March 10, 2021, 07:38:28 am »

That sounds promising. I'll check it out.


Main	Restorations	Software	Audio/Jukebox/MP3	Everything Else	Buy/Sell/Trade
Project Announcements	Monitor/Video	GroovyMAME	Merit/JVL Touchscreen	Meet Up	Retail Vendors
Driving & Racing	Woodworking	Software Support Forums	Consoles	Project Arcade	Reviews
Automated Projects	Artwork	Frontend Support Forums	Pinball	Forum Discussion	Old Boards
Raspberry Pi & Dev Board	controls.dat	Linux	Miscellaneous Arcade	Wiki Discussion	Old Archives
Lightguns	Arcade1Up	Try the site in https mode		Site News


Unread posts \| New Replies \| Recent posts \| Rules \| Chatroom \| Wiki \| File Repository \| RSS \| Submit news

Author Topic: Greatly Improved Black Frame Insertion on 240Hz Monitors (& "Temporal HLSL") (Read 21850 times)

mdrejhon

Re: Greatly Improved Black Frame Insertion on 240Hz Monitors (& "Temporal HLSL")

Osirus23

Re: Greatly Improved Black Frame Insertion on 240Hz Monitors

donluca

Re: Greatly Improved Black Frame Insertion on 240Hz Monitors (& "Temporal HLSL")

mdrejhon

Re: Greatly Improved Black Frame Insertion on 240Hz Monitors (& "Temporal HLSL")

Calamity

Re: Greatly Improved Black Frame Insertion on 240Hz Monitors (& "Temporal HLSL")

mdrejhon

Re: Greatly Improved Black Frame Insertion on 240Hz Monitors (& "Temporal HLSL")

mamenewb100

Re: Greatly Improved Black Frame Insertion on 240Hz Monitors (& "Temporal HLSL")

druroh

Re: Greatly Improved Black Frame Insertion on 240Hz Monitors (& "Temporal HLSL")

mamenewb100

Re: Greatly Improved Black Frame Insertion on 240Hz Monitors (& "Temporal HLSL")