In CRT display, moving the picture down or to the right means adding little delay to the signal. And you can't add much delay, neither you can go faster than the signal (that would mean negative delay

), so when you cant' move your picture any higher on the screen, that means you have reached the zero delay point. You need to work on the signal itself, ie having a longer back porch duration, because the back porch is in the first position on top of the screen, before the active display (picture ends with front porch, at bottom).
If the duration of 0.896 ms is not enough, then the parameters with 0.064 won't do better. You need more than that.
But anyway, these parameters are defined for a line duration of 64 µs, and while some arcade boards do display lines of that duration, they don't necessarily display 262 lines per frame... Ex: the Neo Geo signal is 15625 Hz, so 64 µs for the line duration, but you have 264 lines instead of the expected 262 lines for 240p stuff. So even if you have 14 lines for the back porch (0.896 ms) in your parameters, you'll have a picture that is sligthy off centered.
The Neo Geo signal has 8 lines for the Vsync (0.512 ms, not 0.192), and 16 lines for back porch (not 14). So for the centering, you got 24 lines for the real board (8+16) VS 17 lines for the emulated part.
There are other cases when you have 224 active lines like Neo Geo or many other systems, but only 256 total lines (the PGM), or 259 (Capcom ones, like the 3 CPS and others). So at the end, specify durations for all that stuff doesn't mean it will perfectly fit for every case (in fact, for very few).
What I suggest is to only focus on centering the picture, ie that the middle of the active display matches the middle of the active display of the NTSC standard, because this standard is followed for TV screen as well as arcade monitors. You have some variations for the overscan, but the rest is pretty close.
So, first you adjust the geometry of your screen (arcade or TV) according to the NTSC standard, and then you adjust modelines (or modeline generation) to follow that standard (instead of values indicated by manufacturers, who can be wrong), meaning at the end you just have to slightly modify the size of the picture (Hsize + Vsize, that affects the amplification value for driving the yoke), and you don't care anymore of centering the picture.
If we specify lower frequencies for monitor presets, there's more chance to matche what comes from the real hardwares and MAME timings (when they are correctly written...), and if we follow the same geometry setting for our screens (once for quite), then everything would be identical for everyone. The limit is the size of the different pictures, because it's impossible to have the exact same size for every stuffs, since arcade boards have different pixel clocks and different total frame sizes (that we must follow to have the same frame duration). But everything is supposed to be within the active display of the NTSC standard (or very close to), and according 5 to 10 % overscan in mind.