Don't hold your breath on this however, we're just starting to think how frame slicing should be merged into the new GM design, and distractions coming from the LCD realm can cause us to derail easily 
Skip the Anti Burn In Logic For now...For now, can just educate users to use odd-multiple Hz on an even-multiple Hz monitor (180Hz on a 240Hz and 300Hz on a 360Hz), to solve the BFI burn in problem. We can just use publish something into a WiKi, create a new article, or have a Knowledge Base link to this post, instead.
Agreed, that's why I just want to see simple BFI, to begin with. At least add it without anti-burnin support.
Ok, how about I propose:
(A) Just implement comma-separated BFI pattern idea.
(B) Don't implement phaseshift code.
Initially, you don't even have to worry about choosing VRR or non-VRR. I'd prefer non-VRR, just so I can combine 180Hz + PureXP, because PureXP doesn't work at the same time as VRR. However, I think a simple comma-separated BFI solution would work both with VRR and non-VRR. I'd rather if you don't depreciate BFI though, but find a way to migrate it to future.
Theoretically, BFI could be decoupled as a completely separate engine for another programmer, and let some other programmer worry about BFI. For example, a Windows virtual display driver that handles BFI. GroovyMAME wouldn't need to know. (Are there any virtual display driver people here?
Blur Busters had actually worked on a virtual display driver project that confirmed BFI works at a Windows driver level (refresh-level driver, not frame-level driver so SweetFX / RTSS / ReShade approaches won't work). There was a Windows virtual display driver that I financed the development of, but it is in legal limbo. However, it's long been public knowledge that BFI can be achieved in a Windows virtual display driver, and we know it works... If anyone developed such as BFI driver independent of GroovyMAME, all it would need to monitor is when the frame changes, and it can use flywheel-sync algorithms to sync to the framerate, and handle its own phaseshifting.
(It would mean a 1-refresh-cycle-granularity latency change everytime a phaseshift happened, but at 240Hz, that's only a 4.2ms latency change that will rewind itself during the next phaseshift). And people who want consistency can just use VRR-BFI or the 3x-trick or 5x-trick to avoid phaseshift algorithms for anti-burnin.
Also, another argument against depreciating classical BFI, is that I read about some GroovyMAME users who use BFI with CRT tubes already, to allow 60Hz single-strobe 15KHz look on 31.5KHz arcade tubes, because some of them are doing 120Hz 240p in place of 60Hz 480p (same scanrate), and then using GroovyMAME software BFI to make it look like 60Hz 240p to make a 31.5KHz arcade tube look like a 15.3KHz tube.
I wonder if you'd be able to see horizontal banding due to this on 180 Hz. I mean, since this method enables a sort of rudimentary scaninng.
Yes and no. Depends on the motionspeed. VSYNC OFF tearing is visible at 180fps at 180Hz, so sharp bands will create tearing-looks during rolling-BFI. So you need to alphablend the overlapping slices. Bigger overlap would be bigger invisibility. But higher Hz would allow higher motionspeeds without seam artifacts.
We calculated, that for most motionspeeds of original arcade games, that 360Hz would be the threshold where the seams begin to disappear while producing very useful low persistence for rolling-scan BFIv3 (Temporal HLSL).
Let's consider a very fast-panning arcade game that pans one screenwidth per second. For a 1080p LCD, that is a motionspeed of 1920 pixels/sec, 1920/180 means 11 pixel offsets between rolling-scan frameslices at 180Hz rolling-scan. Even with alphablend, there will be noticeable disjoints.
For slower motionspeeds (e.g. walking Super Mario Brothers), it won't be visible. But a running Super Mario pan would show a slight amount of 180Hz rolling-scan seams, unless you used extremely large persistence (reducing motion blur only a little, rather than by 2/3rds).
That's why 240Hz and 360Hz will be the very beginnings of seamless-enough rolling-scan for fast motions. For now, global BFI will work much better for 180Hz because of this. The good news is you can thought-experiment merging BFIv2 and BFIv3 if you don't mind partially including BFIv3 concepts mentally.
[Thought experiment, not necessarily final numbers]
However, I think BFIv3 can actually clone BFIv2 120Hz by configuring:
sliceduty=1 (pixel visibility time of one refresh cycle)
slicealphablend=0 (no blend percentage between slices)
slicegamma=2.2 (default, ignored if slicealphablend=0)
Or one could use:
sliceheight=100.00 (full screen height frameslice)
slicealphablend=0 (no blend percentage between slices)
slicegamma=2.2 (default, ignored if slicealphablend=0)
Probably bad parameters, but just an example of how this could be begun.
With such configuration parameters, the BFIv3 would autocompute a rolling-bar pattern that was a full-height "bar" for 1 refresh, and a full-height "black" for the next refresh. Allowing BFIv3 to clone BFIv2.
Personally I'd prefer the "sliceduty" approach over "slightheight" because "sliceduty" is refresh-rate agnostic, meaning it would autocompute the BFI rollingbar height depending on the source refresh rate (emulated) and destination refresh rate (actual hardware).
You could even do a 1/3.333th height bar for outputting 60Hz to a 200Hz monitor.
The BFIv3 would autocompute overlaps between refreshes.
For emulating electron gun for a 60Hz CRT onto a 200Hz display using a Hz-non-divisible rolling-scan "Temoral HLSL" phosphor bar:
(rolling-bar black frame insertion)
Concept of Hz-Agnostic Rolling Scan BFI (CRT Scanning Emulation)Situation Example of 60Hz CRT emulation onto a 200Hz LCDEmulator Refresh Cycle #1:
....Real Refresh 1: full 60/200th height bar (30% screen height), at 0%-30% vertical position
....Real Refresh 2: full 60/200th height bar (30% screen height), at 30%-60% vertical position
....Real Refresh 3: fuill 60/200th height bar (30% screen height), at 60%-90% vertical position
....Real Refresh 4: 1/3 of 60/200th height bar (10% screen height), at 90%-100% vertical position
Emulator Refresh Cycle #2:
....Real Refresh 5: 2/3 of 60/200th height bar (20% screen height), at 0%-20% vertical position
....Real Refresh 6: full 60/200th height bar (30% screen height), at 20%-50% vertical position
....Real Refresh 7: full 60/200th height bar (30% screen height), at 50%-80% vertical position
....Real Refresh 8: 2/3 of of 60/200th height bar (20% screen height), at 80%-100% vertical position
Emulator Refresh Cycle #3:
....Real Refresh 9: 1/3 of 60/200th height bar (10% screen height), at 0%-10% vertical position
....Real Refresh 10: full 60/200th height bar (30% screen height), at 10%-40% vertical position
....Real Refresh 11: full 60/200th height bar (30% screen height), at 40%-70% vertical position
....Real Refresh 12: full 60/200th height bar (30% screen height), at 70%-100% vertical position
So you get the concept of Hz-agnostic BFI. Yet the configuration parameters would be configurable to be Hz-divisor BFIv2 with no alphablend. Basically the venn diagram of BFIv3 configurability would fully overlap BFIv2!
For now, use simple mathematics, and makes sure performance stays high (framerate=Hz). Though, theoretically, one can dynamically expand/shrink height of the bars slightly (below human perceptible levels, like 1ms changes), to slew to a sync problem, e.g. audio sync, or arcade monitor sync, or emulator module running behind schedule, etc. Algorithmically, it is important to keep photons-per-pixel-per-second constant, so a photon accumulator array approach (an array the size of the screen resolution), if you want to have varying-framerate BFI for any reasons. Now, that veers into overly complex-think (thinking too far ahead), so I'll back away from those thoughts for now...
Anyway, if one decides to proceed with rolling-bar BFI, be careful that overlapped alphablends are gamma-corrected, with an adjustable-height overlap and configurable gamma-correction factor that can be adjusted. So that photons-per-pixel-per-refresh is identical. And persistence should be adjustable. If adjusted to a factor of one 180Hz refreshcycle worth of photons, then a pixel that was 100% bright one refresh, should be 0% birght next refresh. In the alphablend zone, a pixel that was 25% bright should be 75% bright the next refresh (and vice versa). And obviously, should be gamma corrected, so that 50% bright is exactly half the nit brightness. (Non-gamma-corrected RGB(128,128,128) is not half the number of photons as RGB(255,255,255), so you must always gamma correct to avoid "brightness bars" artifacts. Otherwise, you get dim bars or bright bars in the alphablend zone, depending on how you calculated or how overdrive setting is, so gamma-correct overlap needs to ber adjustable. This becomes much more seamless at higher Hz than at lower Hz. But needless to say, 180Hz and 240Hz permits early prototyping of BFIv3, it will look okay with blur-reducing slow-panning platformers (e.g. Donkey Kong Country would probably look just fine), but at 180Hz will blurry-tear artifact during fast panning (e.g. a full-running-speed Super Mario). For 180Hz, global BFI will look vastly superior at fast speeds.
This would be a simplified "Temporal HLSL" that doesn't require much modification to existing HLSL (except beamraced optimizations that work with both hardware beamraced sync and BFIv3 rolling-scan beamraced sync).
If this overwhelms you, I understand, just don't workflow-architecture this into a corner...

Ideally, I would put a simplified rolling-BFIv3 as a separate layer after a mostly-unmodified Spatial HLSL layer, and abstract Present() away from the real Present(). You'd rerun HLSL filtering for a frameslice area (with enough HLSL filter overlap to accomodate disjoints between HLSL grid, and actual pixel grid, and BFIv3 alphablend overlaps). Heck, even do full-screen HLSL reprocess every frameslice, if you want to keep it simple (and just use GPU brute overkill) to continue unmodified non-beamraced-optimized HLSL for beamraced VSYNC for now.
Eith er way, by abstracting Present() away from the real Present(), a separate module would handle beamraced VSYNC that is compatible with desination-hardware rasters (actual monitor raster, like your existing patch) and software raster module (beamracing with rolling BFI). The same code can be made compatible with beamraced VSYNC. Basically a virutalized beamraceable Present() layer that is future-proof. The software raster module would be the BFIv3 module that snips an alphablend frameslice out of it into a separate frame buffer whereupon that's actually Present()'d
So workflow proposal is:
- Continue your frameslice beam racing approach as you were planning to do;
- But abstract Present() in a way to make it compatible with both beamracing the real hardware raster (beamracing to an existing low-Hz display) or future software rolling scan (beamracing to ultrahigh-Hz display with a rolling phosphor bar in sync with emulator raster)
Oh, I just thought of this now: One could even do a sample-and-hold non-strobed BFIv3, simply using sheer refresh rate to simplify beamraced VSYNC (But without impulse flicker). Basically simply presenting the full partially-rasterplotted emulator framebuffer in its current mid-scanout, everytime the real hardware Hz needed a new refresh cycle. Theoretically, one could use a frameslice height for BFI too, as full height. So basically it'd look like non-strobed VSYNC ON, except with sub-refresh latency, without needing RunAhead, for people who hate flickering 60Hz CRTs.
....So theoretically, the configuration parameters should be flexible enough for BFIv3 to do a non-BFI subrefresh VSYNC ON without needing hardware beamracing; just using sheer Hz as the beamracing method instead.
....Metaphorically, you're simply Present()ing the whole partially rasterplotted framebuffer every actual refresh cycle, to achieve original-latencies (sub-emulator-refresh latency) for ordinary VSYNC ON, simply via using the sheer brute Hz instead of VSYNC OFF frameslice beamracing. Nearly the same lagless original-machine latency, within the granularity of the destination Hz, at least. So BFIv3 would be configurable to do this in theory.
For now, understandably, this is a thought experiment; but it's exciting "year 2021-think" or "year 2025-think". We have times before inexpensive arcade CRTs go extinct, but we should begin thought-experimenting this now thanks to emerging display tech...
Still, I hope you can at least quickly implement a simple rudimentary "100%,0%,0%" BFI for 180Hz to help incubate demand for improved BFI (and attract additional developers for future advanced BFIv3 in a year or two). I don't know which MAME fork deserves this (I'd prefer to see this in mainline), but right now, GroovyMAME is more daring about these kinds of endeavours...
EDIT: Mark, 1000 Hz HLSL scanning would require monitors to be at least 10x brighter than what they are now, and you know it.
There's good news for that: HDR
That's what HDR headroom is for.
The world's first 10,000nit display (Sony prototype I saw at CES) is an LCD.
With that, you still can have 1,000 nits after 90% persistence reductions. That's still brighter than an arcade CRT tube. So you can keep reducing persistence all the way to ~1/16th (1ms out of 16.7ms)....still more than 500 nits.
There's upcoming LG 240Hz 1440p IPS panels (ETA: early 2021) that are VESA HDR600 rated, and I would bet HDR1000 isn't far behind. For a 90% persistence reduction, even a 100 nit screen is still brighter than many 5-year arcade CRT tubes.
HDR exists because it makes those beautiful highlights (e.g. sun glints off a 1957 chevy that's brighter-than-white, or ultra-bright neon signs at night.... 10,000 nits looks absolutely GORGEOUS in these situations). However, the delicious nit headroom helps CRT electron gun emulation in the future.
FALD (Full Array Local Dimming) are typically 1000 nits. They are still somewhat expensive but inexpensive MicroLED backlight sheets for FALD LCDs are coming later in the 2021s-2025s. This will finally produce sub-$1000 FALD desktop monitors in the 24"-32" range, about time (because these screens are close to the appropriate sizes for embedding into a MAME cabinet). So that HDR looks better. This will be much cheaper than OLED panels and MicroLED panels initially, as a stopgap. MicroLED FALDs are super-bright (absolute minimum 1000 nits for some of them). Since these MicroLED FALDs are in the thousands of lights, this keeps haloing down a lot; some of them halo less than a phosphor tube front surface (the halo glow around an electron gun dot); so that's the venn diagram of overlap of FALD-haloing-acceptability. They may still not be found in cheap LCD displays, but the fact that these would be simultaneously FALD + ultra Hz, makes them excellent early candidates for Temporal HLSL emulation of a CRT electron gun in the coming years.
The problem is the bottom-barrel 60Hz LCD is terribly bad, and lowers many people's expectations about LCDs ability. But anybody who's actually seen BFI on the new 240Hz 1ms IPS panels, are shocked at how much better it is nowadays compared to the crappy LightBoost days. And it's going to keep getting better thanks to a convergence of factors (HDR! Sheer Hz! Etc.)
So HDR + retina refresh rates = helps solves the subrefresh CRT electron gun emulation problem.
But yes, Keep it SimpleI understand GroovyMAME's focus is on CRT tubes.
However, GroovyMAME already has BFI, and I would like to have a humble request of keeping that existing feature, and add support for 180Hz BFI (100,0,0) and 240Hz BFI (100,0,0,0), to help incubate demand.