Author Topic: GM ASIO ALPHA 0.171 (Read 95152 times)

intealls · « **Reply #200 on:** July 06, 2015, 03:33:32 pm »

Quote from: Calamity on July 06, 2015, 04:25:39 am

Quote from: intealls on July 06, 2015, 01:32:36 am
Thank you for the detailed explanation! I tried the patch, but didn't do extensive testing. I was about to, but it seemed that many ASIO issues could righted with the existing frame_delay code by setting m_speed with a slight holdoff (consistent speed for 1 second). With that said, with the above rationale and the results Dr.Venom has posted, I can't wait to get to trying it out again (currently no access to a proper GM rig). Also, when I tried the patch, I noticed static tearing, perhaps due to CRT Emudriver using a different line count scheme than the driver used for development?

Oh, ok that totally changes things. I was assuming all recent tests had been done with the new frame delay code. Tearing was expected by the way. I was surprised no one reported it, now I see why. That's what will be hopefully fixed with the proper implementation (modeline dependent). Now Dr.Venom's results with d3d make more sense to me. You can't expect speed stability comparable to ddraw with the existing fd patch and d3d. To summarize, the ultimate goal of the changes I've been suggesting is exactly that: to get ddraw-accurate speed through d3d with minimum latency, so we can eventually get rid of ddraw. First attempt, with d3d9ex, so the manual v-sync hack was unnecessary (fail). Second attempt, accept the fact that manual v-sync hack is going to be required but do an extra syncronization before returning to improve speed accuracy (approximate ddraw-accurate speed by breaking parallelization).

I put in the new code in a test build if anyone wants to try it out! Although, keep in mind that static tearing will be present.

~~https://mega.nz/#!isRVgYzS!i4aaintTvS_NH0FhmqjE0AQxqt2J0gAYtTCZvd7QPhg~~ Edit: get link from a couple of posts below

Am I right in thinking that the tearing could be avoided by adding the vertical front porch of the current mode to the right hand side of the following comparison?

Code: [Select]

if (raster_status.ScanLine >= m_height)

Calamity · « **Reply #201 on:** July 06, 2015, 04:46:00 pm »

Quote from: intealls on July 06, 2015, 03:33:32 pm

Am I right in thinking that the tearing could be avoided by adding the vertical front porch of the current mode to the right hand side of the following comparison?

Actually, you need to check the sync_pulse + back_porch + height.

But you need to deal with interlaced modes where the porch lines must be divided by 2.

Also, forcing the synchronization after the present call to be done in-vblank is innacurate and may cause the retrace to be missed. It's better to check for the first absolute scanline line after retrace or higher. While in-vblank, getrasterstatus returns zero for whatever scanline number. When vblank ends and the new frame starts, the first scanline number is not 1 but sync_pulse + back_porch number of lines. Then the last line in the frame is sync_pulse + back_porch + height. Notice this is only valid for ATI. Intel cards label first line as 1 and last line as height.

(I need to check my notes to make sure, don't have them here right now)

intealls · « **Reply #202 on:** July 06, 2015, 05:00:55 pm »

Quote from: Calamity on July 06, 2015, 04:46:00 pm

Quote from: intealls on July 06, 2015, 03:33:32 pm
Am I right in thinking that the tearing could be avoided by adding the vertical front porch of the current mode to the right hand side of the following comparison?

Sure. But you need to deal with interlaced modes where the porch lines must be divided by 2. Also, forcing the synchronization after the present call to be done in-vblank is innacurate and may cause the retrace to be missed. It's better to check for the first absolute scanline line after retrace or higher. While in-vblank, getrasterstatus returns zero for whatever scanline number. When vblank ends and the new frame starts, the first scanline number is not 1 but sync_pulse + back_porch number of lines. Then the last line in the frame is height + front_porch. Notice this is only valid for ATI. Intel cards label first line as 1 and last line as height.

Thanks

This stuff is way beyond me, I'm happy to experiment and test but I don't think I'll be able to do a proper implementation.

Calamity · « **Reply #203 on:** July 06, 2015, 05:32:45 pm »

I haven't tested this, but it could be more or less like this:

Code: [Select]

void renderer::end_frame()
{
	window().m_primlist->release_lock();

	// flush any pending polygons
	primitive_flush_pending();

	m_shaders->end_frame();

	// finish the scene
	HRESULT result = (*d3dintf->device.end_scene)(m_device);
	if (result != D3D_OK) osd_printf_verbose("Direct3D: Error %08X during device end_scene call\n", (int)result);

	// sync to VBLANK
	if (window().machine().options().frame_delay() != 0 && ((video_config.triplebuf && window().fullscreen()) || video_config.waitvsync || video_config.syncrefresh))
	{
		D3DRASTER_STATUS raster_status;
		memset (&raster_status, 0, sizeof(D3DRASTER_STATUS));

		osd_printf_verbose("\nwait vblank start\n");
		int last_scanline = m_switchres_mode && m_switchres_mode->vtotal?
							m_switchres_mode->vactive + (m_switchres_mode->vtotal - m_switchres_mode->vbegin) / (m_switchres_mode->interlace?2:1) :
							m_height;

		while (!raster_status.InVBlank)
		{
			if ((*d3dintf->device.get_raster_status)(m_device, &raster_status) != D3D_OK)
				break;

			osd_printf_verbose("current_line: %d\n", raster_status.ScanLine);
			if (raster_status.ScanLine >= last_scanline)
				break;
		}
	}

	// present the current buffers
	result = (*d3dintf->device.present)(m_device, NULL, NULL, NULL, NULL, 0);
	if (result != D3D_OK) osd_printf_verbose("Direct3D: Error %08X during device present call\n", (int)result);

	// sync to VBLANK
	if (window().machine().options().frame_delay() != 0 && ((video_config.triplebuf && window().fullscreen()) || video_config.waitvsync || video_config.syncrefresh))
	{
		D3DRASTER_STATUS raster_status;
		memset (&raster_status, 0, sizeof(D3DRASTER_STATUS));

		osd_printf_verbose("wait vblank end\n");
		int first_scanline = m_switchres_mode && m_switchres_mode->vtotal?
							(m_switchres_mode->vtotal - m_switchres_mode->vbegin) / (m_switchres_mode->interlace?2:1) :
							1;

		while (raster_status.ScanLine <= first_scanline)
		{
			if ((*d3dintf->device.get_raster_status)(m_device, &raster_status) != D3D_OK)
				break;

			osd_printf_verbose("current_line: %d\n", raster_status.ScanLine);
		}
	}
}

intealls · « **Reply #204 on:** July 06, 2015, 06:07:34 pm »

Quote from: Calamity on July 06, 2015, 05:32:45 pm

I haven't tested this, but it could be more or less like this:

Code: [Select]
void renderer::end_frame() { window().m_primlist->release_lock(); // flush any pending polygons primitive_flush_pending(); m_shaders->end_frame(); // finish the scene HRESULT result = (*d3dintf->device.end_scene)(m_device); if (result != D3D_OK) osd_printf_verbose("Direct3D: Error %08X during device end_scene call\n", (int)result); // sync to VBLANK if (window().machine().options().frame_delay() != 0 && ((video_config.triplebuf && window().fullscreen()) || video_config.waitvsync || video_config.syncrefresh)) { D3DRASTER_STATUS raster_status; memset (&raster_status, 0, sizeof(D3DRASTER_STATUS)); osd_printf_verbose("\nwait vblank start\n"); int last_scanline = m_switchres_mode && m_switchres_mode->vtotal? m_switchres_mode->vactive + (m_switchres_mode->vtotal - m_switchres_mode->vbegin) / (m_switchres_mode->interlace?2:1) : m_height; while (!raster_status.InVBlank) { if ((*d3dintf->device.get_raster_status)(m_device, &raster_status) != D3D_OK) break; osd_printf_verbose("current_line: %d\n", raster_status.ScanLine); if (raster_status.ScanLine >= last_scanline) break; } } // present the current buffers result = (*d3dintf->device.present)(m_device, NULL, NULL, NULL, NULL, 0); if (result != D3D_OK) osd_printf_verbose("Direct3D: Error %08X during device present call\n", (int)result); // sync to VBLANK if (window().machine().options().frame_delay() != 0 && ((video_config.triplebuf && window().fullscreen()) || video_config.waitvsync || video_config.syncrefresh)) { D3DRASTER_STATUS raster_status; memset (&raster_status, 0, sizeof(D3DRASTER_STATUS)); osd_printf_verbose("wait vblank end\n"); int first_scanline = m_switchres_mode && m_switchres_mode->vtotal? (m_switchres_mode->vtotal - m_switchres_mode->vbegin) / (m_switchres_mode->interlace?2:1) : 1; while (raster_status.ScanLine <= first_scanline) { if ((*d3dintf->device.get_raster_status)(m_device, &raster_status) != D3D_OK) break; osd_printf_verbose("current_line: %d\n", raster_status.ScanLine); } } }

Thanks alot!

I currently don't have access to a testing rig (currently using a laptop with Intel integrated graphics), but I've added the code to the build posted below, will test tomorrow evening.

https://mega.nz/#!TlIEnCRA!W_6nh2B-UKVy6MNPZNm36aa1ZvMKpMDvHfif2rD9-WU

Calamity · « **Reply #205 on:** July 06, 2015, 06:18:29 pm »

That's great! Thanks!

Dr.Venom · « **Reply #206 on:** July 07, 2015, 07:10:52 am »

Hi guys,

I tested the latest build intealls posted, but unfortunately it doesn't seem to improve performance (materially). It still keeps producing a lot of audio underruns, and/or crashes the emulator at the latency where ddraw runs quite smoothly.

Quote from: Calamity on July 04, 2015, 10:57:38 am

Regarding the lower stability observed with -mt enabled, I think there's an explanation. In order to calculate current emulation speed, MAME measures the time elapsed once per frame. The exact point where this measurement is done is critical for the accuracy of the result. Usually, doing it right after we return from drawing is the best because the v-sync involved is more accurate than anything else. However, when running multithreaded, the draw operation is done in a separate blitting thread, while the main thread, where the speed measurement is to be performed, waits until it's signaled as ready by the blitting thread. This synchronization is done by means of an event object. The issue is, this mechanism requires the Windows scheduler to actually awake the main thread. This introduces an uncertainty relative to the time since we signal the event and the instant the thread is effectively awaken. This uncertainty doesn't happen when running single-threaded, as it will always take the same amount of cpu cycles to return from the drawing funtions back the main loop.

Thanks for the explanation.

Quote from: Calamity on July 04, 2015, 10:57:38 am

That said, the ultimate reason to use multithreading is to improve input response. So it should be determined if nowadays hardware still benefits from multithreading on this regard, as older hardware used to do. If this is still true, it might be possible to keep the multithreading just for input and force drawing and emulation in the same thread.

To my personal experience with the older gm's, there's a benefit from the input multithreading even on high-end hardware. If running the video in a single thread would end up being better for stabilty, then to me it definitely sound like a good idea to try and keep the multithreading just for input. If only to do extensive testing to see whether it really has a benefit or not.

Btw, this all assumes there's no releation to the video being multithreaded and that affecting input latency in some way?

Quote from: intealls on July 06, 2015, 01:32:36 am

Again, this is astonishing. What sound card/driver are you using? I don't think the cards I use with kX allow me to go to 48 without issues.

It's the ESI Juli@ XTe (see here) . I've tested quite some cards in the past, but this one has indeed the tightest ASIO driver I've encountered. The native driver also outperforms Asio4All in the speed/stability department. The great thing is the driver package is only 1.12MB and simply does what it must do. Apparently that's still possible on Windows

Quote from: intealls on July 06, 2015, 01:32:36 am

-asio_playback_rate can be used to force a resampling rate, using it will ignore the game speed so it's mostly useful for debugging.

Out of interest, what does the asio_callback_frequency actually measure? Do you know by any chance how it relates to the results of the Freqtest utility?

Quote from: intealls on June 29, 2015, 05:26:40 pm

Edit: r1.5 contains a new batool version, with a drastically improved frequency calibration routine

Wow, this one does indeed perform a hell of a lot faster! With the old method I had to wait a while to average out (average of the different passes) on 48001.31. And this one converged quite fast to 48001.28

Calamity · « **Reply #207 on:** July 07, 2015, 08:42:11 am »

Quote from: Dr.Venom on July 07, 2015, 07:10:52 am

I tested the latest build intealls posted, but unfortunately it doesn't seem to improve performance (materially). It still keeps producing a lot of audio underruns, and/or crashes the emulator at the latency where ddraw runs quite smoothly.

Indeed, sorry for your time guys. There was an error in that implementation that made the second sync (the important for stability) unuseful. This one should do better. If it doesn't, then I'm definitely wrong.

Code: [Select]

void renderer::end_frame()
{
	window().m_primlist->release_lock();

	// flush any pending polygons
	primitive_flush_pending();

	m_shaders->end_frame();

	// finish the scene
	HRESULT result = (*d3dintf->device.end_scene)(m_device);
	if (result != D3D_OK) osd_printf_verbose("Direct3D: Error %08X during device end_scene call\n", (int)result);

	int last_scanline = m_switchres_mode && m_switchres_mode->vtotal?
						m_switchres_mode->vactive + (m_switchres_mode->vtotal - m_switchres_mode->vbegin) / (m_switchres_mode->interlace?2:1) :
						m_height;

	int first_scanline = m_switchres_mode && m_switchres_mode->vtotal?
						(m_switchres_mode->vtotal - m_switchres_mode->vbegin) / (m_switchres_mode->interlace?2:1) :
						1;

	// sync to VBLANK
	osd_printf_verbose("\nwait vblank start\n");
	if (window().machine().options().frame_delay() != 0 && ((video_config.triplebuf && window().fullscreen()) || video_config.waitvsync || video_config.syncrefresh))
	{
		D3DRASTER_STATUS raster_status;
		memset (&raster_status, 0, sizeof(D3DRASTER_STATUS));

		while (!raster_status.InVBlank)
		{
			if ((*d3dintf->device.get_raster_status)(m_device, &raster_status) != D3D_OK)
				break;

			osd_printf_verbose("current_line: %d\n", raster_status.ScanLine);
			if (raster_status.ScanLine >= last_scanline)
				break;
		}
	}

	// present the current buffers
	result = (*d3dintf->device.present)(m_device, NULL, NULL, NULL, NULL, 0);
	if (result != D3D_OK) osd_printf_verbose("Direct3D: Error %08X during device present call\n", (int)result);

	// sync to VBLANK
	osd_printf_verbose("wait vblank end\n");
	if (window().machine().options().frame_delay() != 0 && ((video_config.triplebuf && window().fullscreen()) || video_config.waitvsync || video_config.syncrefresh))
	{
		D3DRASTER_STATUS raster_status;
		memset (&raster_status, 0, sizeof(D3DRASTER_STATUS));

		while (raster_status.ScanLine >= last_scanline || raster_status.ScanLine <= first_scanline)
		{
			if ((*d3dintf->device.get_raster_status)(m_device, &raster_status) != D3D_OK)
				break;

			osd_printf_verbose("current_line: %d\n", raster_status.ScanLine);
		}
	}
}

Quote

To my personal experience with the older gm's, there's a benefit from the input multithreading even on high-end hardware. If running the video in a single thread would end up being better for stabilty, then to me it definitely sound like a good idea to try and keep the multithreading just for input. If only to do extensive testing to see whether it really has a benefit or not.

Btw, this all assumes there's no releation to the video being multithreaded and that affecting input latency in some way?

Yes, I'll probably implement this. Having video and emulation in two threads is not required when running with syncrefresh. It is when using triplebuffer, because D3D9 does not support dropping frame and we have to simulate it. What we do when running with syncrefresh is to force both threads to be synchronized by means of an event object. I suspected it might have a performance penalty (in terms of speed stability) but your tests confirmed it. However that penalty hasn't been a problem until the more demanding ASIO audio has made it visible.

The video being multithreaded doesn't affect input latency. What affects latency is using the same thread to present v-synced video and process input.

filimpan · « **Reply #208 on:** July 07, 2015, 09:26:20 am »

Sorry to go off-topic. I'm using GM ASIO 0.163 r1.5 (with d3d9ex) and my sound card is a Creative Sound Blaster ZxR. I'm using the stock Creative driver because it's better than Asio4ALL for me.

I used batool64.exe --cp to set 6ms, which gives me the following in the log:

ASIO: Driver Creative SBZ Series ASIO initialized with latency 288,
sample rate 48000.00, compensated sample rate 48000.000000,
buffer size of 32768 samples
holdoff is 864 samples

This still produces a few overruns and underruns, but it's acceptable. I'm using -nosleep and priority 1 (not sure if it made a difference, but it can't be bad, right?), and HPET is disabled because it made no difference.

The last thing to do is to set a compensated sample rate. The issue I'm having is that batool64 0 runs for a couple of minutes and then closes abruptly. It starts out at about 477xx Hz and climbs almost immediately and logarithmically to about ~47990. The last number I see is ~47998.15 before command prompt just suddenly closes. Is 47998.15 then my compensated sample rate, or is something wrong? I tried the build posted here and get the same issue.

Also, about the "buffer size of 32768 samples", is that not implying another ~683ms of latency on top of the 6ms I set? "buffer size" sounds latency related. I read that devices have an inherent buffer size that is additional to the buffer size you set, so it would be 32768+288 samples to get the full latency value. It doesn't feel like I'm getting ~690ms of audio latency, but I'd still like to know what "buffer size" means in this context.

Finally, and perhaps "buffer size" is relevant to this next question as well, can someone please tell me what latency value at which it would no longer be worth using ASIO? It was mentioned in this thread that anything under 256 was good, and I'm wondering if my value of 288 is therefore not good, and I should go back to dsound with latency 1.0.

Dr.Venom · « **Reply #209 on:** July 07, 2015, 12:01:19 pm »

Quote from: Calamity on July 07, 2015, 08:42:11 am

Quote from: Dr.Venom on July 07, 2015, 07:10:52 am
I tested the latest build intealls posted, but unfortunately it doesn't seem to improve performance (materially). It still keeps producing a lot of audio underruns, and/or crashes the emulator at the latency where ddraw runs quite smoothly.

Indeed, sorry for your time guys. There was an error in that implementation that made the second sync (the important for stability) unuseful. This one should do better. If it doesn't, then I'm definitely wrong.

No problem, hopefully your new adjustments will bring it on par.

@ intealls, would it be possibe to disable the continuous scanline number logging with the next release? Even though negligible, I guess it may have some negative impact on the test results.

Quote from: Calamity on July 07, 2015, 08:42:11 am

Quote from: Dr.Venom on July 07, 2015, 07:10:52 am
To my personal experience with the older gm's, there's a benefit from the input multithreading even on high-end hardware. If running the video in a single thread would end up being better for stabilty, then to me it definitely sound like a good idea to try and keep the multithreading just for input. If only to do extensive testing to see whether it really has a benefit or not.

Yes, I'll probably implement this. Having video and emulation in two threads is not required when running with syncrefresh. It is when using triplebuffer, because D3D9 does not support dropping frame and we have to simulate it. What we do when running with syncrefresh is to force both threads to be synchronized by means of an event object. I suspected it might have a performance penalty (in terms of speed stability) but your tests confirmed it. However that penalty hasn't been a problem until the more demanding ASIO audio has made it visible.

Great that the tests were of help in confirming your suspicion. I can imagine that the penalty wasn't really an issue with directsound.

It will be interesting to see how only input on a separate thread will perform. Slightly off-topic, but if I ever wanted to buy a high-speed camera for doing the latency test with a led wired to the joystick, is there one which you would recommend? Or maybe there are things to keep in mind when looking for such a camera?

vicosku · « **Reply #210 on:** July 07, 2015, 12:26:22 pm »

Quote from: Dr.Venom on July 07, 2015, 12:01:19 pm

Slightly off-topic, but if I ever wanted to buy a high-speed camera for doing the latency test with a led wired to the joystick, is there one which you would recommend? Or maybe there are things to keep in mind when looking for such a camera?

Personally, I just use my work phone's slow motion mode. The iPhone 6 records at 240FPS, and I was using a 5S at 120FPS last year. I got a cheap mini-tripod as well to make recording easier. I'd suspect this is a fairly common feature on phones and cameras these days, but I haven't looked into it since I already had a solution in-hand.

intealls · « **Reply #211 on:** July 07, 2015, 01:31:16 pm »

Quote from: Calamity on July 07, 2015, 08:42:11 am

Code: [Select]
void renderer::end_frame() { window().m_primlist->release_lock(); // flush any pending polygons primitive_flush_pending(); m_shaders->end_frame(); // finish the scene HRESULT result = (*d3dintf->device.end_scene)(m_device); if (result != D3D_OK) osd_printf_verbose("Direct3D: Error %08X during device end_scene call\n", (int)result); int last_scanline = m_switchres_mode && m_switchres_mode->vtotal? m_switchres_mode->vactive + (m_switchres_mode->vtotal - m_switchres_mode->vbegin) / (m_switchres_mode->interlace?2:1) : m_height; int first_scanline = m_switchres_mode && m_switchres_mode->vtotal? (m_switchres_mode->vtotal - m_switchres_mode->vbegin) / (m_switchres_mode->interlace?2:1) : 1; // sync to VBLANK osd_printf_verbose("\nwait vblank start\n"); if (window().machine().options().frame_delay() != 0 && ((video_config.triplebuf && window().fullscreen()) || video_config.waitvsync || video_config.syncrefresh)) { D3DRASTER_STATUS raster_status; memset (&raster_status, 0, sizeof(D3DRASTER_STATUS)); while (!raster_status.InVBlank) { if ((*d3dintf->device.get_raster_status)(m_device, &raster_status) != D3D_OK) break; osd_printf_verbose("current_line: %d\n", raster_status.ScanLine); if (raster_status.ScanLine >= last_scanline) break; } } // present the current buffers result = (*d3dintf->device.present)(m_device, NULL, NULL, NULL, NULL, 0); if (result != D3D_OK) osd_printf_verbose("Direct3D: Error %08X during device present call\n", (int)result); // sync to VBLANK osd_printf_verbose("wait vblank end\n"); if (window().machine().options().frame_delay() != 0 && ((video_config.triplebuf && window().fullscreen()) || video_config.waitvsync || video_config.syncrefresh)) { D3DRASTER_STATUS raster_status; memset (&raster_status, 0, sizeof(D3DRASTER_STATUS)); while (raster_status.ScanLine >= last_scanline || raster_status.ScanLine <= first_scanline) { if ((*d3dintf->device.get_raster_status)(m_device, &raster_status) != D3D_OK) break; osd_printf_verbose("current_line: %d\n", raster_status.ScanLine); } } }

Here's a build with the latest changes!

https://mega.nz/#!DtQUkDwL!kZXlokfdyvTZUvFtOD4uNQZMGWfJFph-i4Ee_DvQ3xs

Quote from: vicosku on July 07, 2015, 12:26:22 pm

Quote from: Dr.Venom on July 07, 2015, 12:01:19 pm
Slightly off-topic, but if I ever wanted to buy a high-speed camera for doing the latency test with a led wired to the joystick, is there one which you would recommend? Or maybe there are things to keep in mind when looking for such a camera?

Personally, I just use my work phone's slow motion mode. The iPhone 6 records at 240FPS, and I was using a 5S at 120FPS last year. I got a cheap mini-tripod as well to make recording easier. I'd suspect this is a fairly common feature on phones and cameras these days, but I haven't looked into it since I already had a solution in-hand.

I use a Playstation Eye camera, which can be had for cheap. It records 320x240@125 fps, so the picture is pretty crappy, but the videos can be recorded and viewed in VirtualDub so despite lacking image quality it's quite good to work with.

Quote from: Dr.Venom on July 07, 2015, 12:01:19 pm

@ intealls, would it be possibe to disable the continuous scanline number logging with the next release? Even though negligible, I guess it may have some negative impact on the test results.

Removed in the build posted!

Quote from: filimpan on July 07, 2015, 09:26:20 am

Sorry to go off-topic. I'm using GM ASIO 0.163 r1.5 (with d3d9ex) and my sound card is a Creative Sound Blaster ZxR. I'm using the stock Creative driver because it's better than Asio4ALL for me.

I used batool64.exe --cp to set 6ms, which gives me the following in the log:

ASIO: Driver Creative SBZ Series ASIO initialized with latency 288,
sample rate 48000.00, compensated sample rate 48000.000000,
buffer size of 32768 samples
holdoff is 864 samples

This still produces a few overruns and underruns, but it's acceptable. I'm using -nosleep and priority 1 (not sure if it made a difference, but it can't be bad, right?), and HPET is disabled because it made no difference.

I think you should be able to go lower than 6 ms with that card, try setting a buffer size of 128 to 192, currently that might actually improve the over-/underrun statistics. Also, don't put too much effort currently into trying to optimize, I'll most likely bring about changes come next release that will improve this statistic.

Quote from: filimpan on July 07, 2015, 09:26:20 am

The last thing to do is to set a compensated sample rate. The issue I'm having is that batool64 0 runs for a couple of minutes and then closes abruptly. It starts out at about 477xx Hz and climbs almost immediately and logarithmically to about ~47990. The last number I see is ~47998.15 before command prompt just suddenly closes. Is 47998.15 then my compensated sample rate, or is something wrong? I tried the build posted here and get the same issue.

If it says 'rate has converged after x seconds', then that's the intended behavior, and your rate should be the one batool finds (in your case 47998.15).

Quote from: filimpan on July 07, 2015, 09:26:20 am

Also, about the "buffer size of 32768 samples", is that not implying another ~683ms of latency on top of the 6ms I set? "buffer size" sounds latency related. I read that devices have an inherent buffer size that is additional to the buffer size you set, so it would be 32768+288 samples to get the full latency value. It doesn't feel like I'm getting ~690ms of audio latency, but I'd still like to know what "buffer size" means in this context.

No, this is an irrelevant bit information best removed. The holdoff parameter attemps to controls the actual latency.

Quote from: filimpan on July 07, 2015, 09:26:20 am

Finally, and perhaps "buffer size" is relevant to this next question as well, can someone please tell me what latency value at which it would no longer be worth using ASIO? It was mentioned in this thread that anything under 256 was good, and I'm wondering if my value of 288 is therefore not good, and I should go back to dsound with latency 1.0.

Like I said, I think you should be able to go lower. If not, you could try using the integrated card with ASIO4ALL, since it will most likely perform better. And also, with a setting of 288, ASIO still lops off over 60 ms of latency (if things haven't changed since version 0.154 which this metric is based on).

Quote from: Dr.Venom on July 07, 2015, 07:10:52 am

Quote from: intealls on July 06, 2015, 01:32:36 am
Again, this is astonishing. What sound card/driver are you using? I don't think the cards I use with kX allow me to go to 48 without issues.

It's the ESI Juli@ XTe (see here) . I've tested quite some cards in the past, but this one has indeed the tightest ASIO driver I've encountered. The native driver also outperforms Asio4All in the speed/stability department. The great thing is the driver package is only 1.12MB and simply does what it must do. Apparently that's still possible on Windows

Thanks, might have to pick up one of those too.

Quote from: Dr.Venom on July 07, 2015, 07:10:52 am

Quote from: intealls on July 06, 2015, 01:32:36 am
-asio_playback_rate can be used to force a resampling rate, using it will ignore the game speed so it's mostly useful for debugging.

Out of interest, what does the asio_callback_frequency actually measure? Do you know by any chance how it relates to the results of the Freqtest utility?

You're right in that it's not really the best definition. It would be better to call it BASSASIO sample request frequency or similar, since the metric is how many samples BASSASIO requests each second. There might be a correlation between this and freqtest, I've done some preliminary tests but haven't seen anything decisive.

Quote from: Dr.Venom on July 07, 2015, 07:10:52 am

Quote from: intealls on June 29, 2015, 05:26:40 pm
Edit: r1.5 contains a new batool version, with a drastically improved frequency calibration routine

Wow, this one does indeed perform a hell of a lot faster! With the old method I had to wait a while to average out (average of the different passes) on 48001.31. And this one converged quite fast to 48001.28

Yeah.

It's a lot better. Also, it allows one to see what external factors might affect the request frequency in a reasonable way (HPET/QPC etc). For instance kX allows different sync methods for the ASIO driver, some of these seem to make the rate converge in a few seconds, but eats a lot more CPU than the standard one.

filimpan · « **Reply #212 on:** July 07, 2015, 06:07:07 pm »

Super thanks for the assistance.

The ZxR has very weak performance in certain areas to be honest. My PC is a 4.2Ghz quad-core Ivy Bridge CPU with 16GB of pretty quick RAM, and I've disabled all power saving features and CPU sleep features as well as unnecessary processes. When I run audio sensitive applications and programs, I also disable the network adapter to reduce DPC latency even more. Despite all this, and despite letting the sound card run in exclusive mode with all sound processing off, I usually need >15ms latency for WASAPI exclusive and more than that for ASIO if I want to completely eliminate clicks and pops. It's a pretty common complaint with these cards. I would try on-board like you suggested, but the sound card does sound audibly better, so I can live with the latency.

I don't know if 'rate has converged after x seconds' does pop up. A line pops up, but it's there for about 1 frame before cmd closes. I assume that's what it says. So I put the 47998.15 into mame.ini and got slightly better results

. I used to get about 2 overruns and 12 underruns in 180 seconds with 288 samples. This time I got 0/10. I also tried 192 and 96 samples, and got 1/10 in 190 seconds, and 2/13 in 300 seconds respectively. Overall very similar results, but definitely better than before. I look forward to the next version of GM ASIO

intealls · « **Reply #213 on:** July 07, 2015, 11:20:51 pm »

Quote from: filimpan on July 07, 2015, 06:07:07 pm

I don't know if 'rate has converged after x seconds' does pop up. A line pops up, but it's there for about 1 frame before cmd closes. I assume that's what it says. So I put the 47998.15 into mame.ini and got slightly better results . I used to get about 2 overruns and 12 underruns in 180 seconds with 288 samples. This time I got 0/10. I also tried 192 and 96 samples, and got 1/10 in 190 seconds, and 2/13 in 300 seconds respectively. Overall very similar results, but definitely better than before. I look forward to the next version of GM ASIO

Ah, I understand the problem. I should add a 'press enter to exit' to the tool. Thanks for reporting this.

Also, I haven't really tested the ASIO stuff with the setup you're using (120 Hz etc), so please report issues if you encounter them.

intealls · « **Reply #214 on:** July 08, 2015, 01:09:37 am »

Quote from: intealls on July 06, 2015, 06:07:34 pm

Quote from: Calamity on July 06, 2015, 05:32:45 pm
I haven't tested this, but it could be more or less like this:

Code: [Select]
void renderer::end_frame() { window().m_primlist->release_lock(); // flush any pending polygons primitive_flush_pending(); m_shaders->end_frame(); // finish the scene HRESULT result = (*d3dintf->device.end_scene)(m_device); if (result != D3D_OK) osd_printf_verbose("Direct3D: Error %08X during device end_scene call\n", (int)result); // sync to VBLANK if (window().machine().options().frame_delay() != 0 && ((video_config.triplebuf && window().fullscreen()) || video_config.waitvsync || video_config.syncrefresh)) { D3DRASTER_STATUS raster_status; memset (&raster_status, 0, sizeof(D3DRASTER_STATUS)); osd_printf_verbose("\nwait vblank start\n"); int last_scanline = m_switchres_mode && m_switchres_mode->vtotal? m_switchres_mode->vactive + (m_switchres_mode->vtotal - m_switchres_mode->vbegin) / (m_switchres_mode->interlace?2:1) : m_height; while (!raster_status.InVBlank) { if ((*d3dintf->device.get_raster_status)(m_device, &raster_status) != D3D_OK) break; osd_printf_verbose("current_line: %d\n", raster_status.ScanLine); if (raster_status.ScanLine >= last_scanline) break; } } // present the current buffers result = (*d3dintf->device.present)(m_device, NULL, NULL, NULL, NULL, 0); if (result != D3D_OK) osd_printf_verbose("Direct3D: Error %08X during device present call\n", (int)result); // sync to VBLANK if (window().machine().options().frame_delay() != 0 && ((video_config.triplebuf && window().fullscreen()) || video_config.waitvsync || video_config.syncrefresh)) { D3DRASTER_STATUS raster_status; memset (&raster_status, 0, sizeof(D3DRASTER_STATUS)); osd_printf_verbose("wait vblank end\n"); int first_scanline = m_switchres_mode && m_switchres_mode->vtotal? (m_switchres_mode->vtotal - m_switchres_mode->vbegin) / (m_switchres_mode->interlace?2:1) : 1; while (raster_status.ScanLine <= first_scanline) { if ((*d3dintf->device.get_raster_status)(m_device, &raster_status) != D3D_OK) break; osd_printf_verbose("current_line: %d\n", raster_status.ScanLine); } } }

Thanks alot!

I currently don't have access to a testing rig (currently using a laptop with Intel integrated graphics), but I've added the code to the build posted below, will test tomorrow evening.

https://mega.nz/#!TlIEnCRA!W_6nh2B-UKVy6MNPZNm36aa1ZvMKpMDvHfif2rD9-WU

And here are my results (all without multithreading)! Be sure to look at the scale of the speed percentage, Octave wasn't being consistent with this.

From what I can see, there's a striking similarity between the new d3d code and ddraw, at least for the games I've tested. ddraw seems to produce _slightly_ more consistent speeds (however, they're miniscule, look at the scale!), I don't think this will make any difference at all in regards to audio performance. Other factors might factor in of course but I don't think the speed variation will make any difference.

This has prompted me to do a new test - add the d3d9ex code (I think I'll add this permanently to the ASIO patch to make it W7 only, ASIO seems to be pretty crappy in XP anyway), remove the m_speed hack currently in the 0.163 build, and do a test with the new frame_delay code (with/without -mt).

Paradroid · « **Reply #215 on:** July 08, 2015, 01:49:22 am »

Quote from: intealls on July 08, 2015, 01:09:37 am

ASIO seems to be pretty crappy in XP anyway

Haha! Tell that to the thousands of musicians who were loath to give up their nicely tune XP for Win 7 when it first came out.

For what it's worth, I can still get lower latency on my Lenovo W500 laptop and RME Fireface UC interface under XP than 7 when processing live guitar signals. The DPC latency when booting to XP is still lower than W7 with my rig.

Anyway, I digress...

Keep up the good work! I have been following this really closely. I have a bunch of quality ASIO devices (Edirol UA-25, M Audio Audiophile 2496, RME Fireface UC and RME Fireface 400). Obviously, the quality of some of these devices (esp. RME) would be overkill for MAME but I have 3 NOS Audiophile cards that I plan on using in my cabs once you've got this nutted.

I suppose I should get off ---my bottom--- and help with the testing since I have all that hardware sitting here...

intealls · « **Reply #216 on:** July 08, 2015, 02:22:53 am »

Quote from: Paradroid on July 08, 2015, 01:49:22 am

Quote from: intealls on July 08, 2015, 01:09:37 am
ASIO seems to be pretty crappy in XP anyway

Haha! Tell that to the thousands of musicians who were loath to give up their nicely tune XP for Win 7 when it first came out.

For what it's worth, I can still get lower latency on my Lenovo W500 laptop and RME Fireface UC interface under XP than 7 when processing live guitar signals. The DPC latency when booting to XP is still lower than W7 with my rig.

Anyway, I digress...

Keep up the good work! I have been following this really closely. I have a bunch of quality ASIO devices (Edirol UA-25, M Audio Audiophile 2496, RME Fireface UC and RME Fireface 400). Obviously, the quality of some of these devices (esp. RME) would be overkill for MAME but I have 3 NOS Audiophile cards that I plan on using in my cabs once you've got this nutted.

I suppose I should get off ---my bottom--- and help with the testing since I have all that hardware sitting here...

I should probably clarify that to 'From my (fairly limited) testing, ASIO seems to be pretty crappy with ASIO4ALL/kX in XP anyway'

I seem to remember getting better results on XP with an Audigy 2/Creative driver. But this was a couple of years ago.

All testing is greatly appreciated! The more drivers/cards that can be tested the better. However, it seems that some of the cards you have will cost more than a fast, brand new GM system, so I don't know how representative those particular tests will be for the majority of users.

Edit: However, it might be extremely interesting to know what the calibration frequency of those devices might turn out to.

Paradroid · « **Reply #217 on:** July 08, 2015, 02:40:28 am »

Quote from: intealls on July 08, 2015, 02:22:53 am

However, it seems that some of the cards you have will cost more than a fast, brand new GM system, so I don't know how representative those particular tests will be for the majority of users.

That's true. I guess the advantage for testing is that you know that RME aren't going to give you a half-baked driver so you can eliminate the driver from the equation when testing.

This ASIO thing is definitely worth pursuing though... when it's done right, the low latency afforded by a decent ASIO driver and compatible application is a pretty magic thing. For musicians, it changed everything. With a good setup, we're talking less than 10 ms for the round trip (input, processing, output). Musicians are pretty intimate with their instruments so they can "feel" the latency much more readily than your average gamer (not that any of us are really "average gamers"... pursuing obsolete technology (CRTs) and games (arcade and retro console)

).

I think your ASIO initiative is definitely a very good one. It's the audio equivalent of what is being chased with frame delay and low latency input. The advantage with the ASIO adventure is that it's a proven technology and there are so many examples of great implementations (audio software, not games). The point is that it can definitely be done. The MAME case seems to have it's challenges though...

ASIO4ALL is a really neat tool. Same with the KX stuff. However, I think the thing to look for would be an outmoded pro audio card that is now reasonably cheap (such as the M-Audio card I mentioned or the ESI card Dr Venom mentioned). These kind of cards are no longer "state of the art" but are still miles ahead of average consumer cards in terms of quality. The real benefit though would be the stable drivers that would have come about after much testing from highly demanding users (high latency, glitches, drop-outs, flaky drivers aren't tolerated by producers/musicians).

Keep up the good work! I'll try to get around to some testing soon.

intealls · « **Reply #218 on:** July 08, 2015, 03:04:53 am »

Quote from: Paradroid on July 08, 2015, 02:40:28 am

Quote from: intealls on July 08, 2015, 02:22:53 am
However, it seems that some of the cards you have will cost more than a fast, brand new GM system, so I don't know how representative those particular tests will be for the majority of users.
That's true. I guess the advantage for testing is that you know that RME aren't going to give you a half-baked driver so you can eliminate the driver from the equation when testing.

Absolutely, I edited my above post to add that it might be very interesting to see what the calibration frequency of these cards turn out to be.

Quote from: Paradroid on July 08, 2015, 02:40:28 am

I think your ASIO initiative is definitely a very good one. It's the audio equivalent of what is being chased with frame delay and low latency input. The advantage with the ASIO adventure is that it's a proven technology and there are so many examples of great implementations (audio software, not games). The point is that it can definitely be done. The MAME case seems to have it's challenges though...

ASIO4ALL is a really neat tool. Same with the KX stuff. However, I think the thing to look for would be an outmoded pro audio card that is now reasonably cheap (such as the M-Audio card I mentioned or the ESI card Dr Venom mentioned). These kind of cards are no longer "state of the art" but are still miles ahead of average consumer cards in terms of quality. The real benefit though would be the stable drivers that would have come about after much testing from highly demanding users (high latency, glitches, drop-outs, flaky drivers aren't tolerated by producers/musicians).

Keep up the good work! I'll try to get around to some testing soon.

Thanks! The implementation currently uses BASSASIO, which is actually very straightforward to use. I currently cannot find an incentive to use anything else. It's possible to use an external resampling lib/write a resampler and use that with WASAPI/Steinberg ASIO SDK but I currently don't see the point, there's still issues that needs to be worked out before doing anything else. WASAPI might be nice since it's an official Windows thing but most of the issues that are relevant for ASIO will probably apply for WASAPI as well. Also, ASIO4ALL seems to cover general sound card support pretty well.

Calamity · « **Reply #219 on:** July 08, 2015, 04:08:43 am »

Quote from: intealls on July 08, 2015, 01:09:37 am

From what I can see, there's a striking similarity between the new d3d code and ddraw, at least for the games I've tested. ddraw seems to produce _slightly_ more consistent speeds (however, they're miniscule, look at the scale!), I don't think this will make any difference at all in regards to audio performance. Other factors might factor in of course but I don't think the speed variation will make any difference.

Well, this is great! I understand that you didn't get any tearing either. Looks like ddraw might be a bit more consistent still probably since it's doing the whole thing in kernel mode before returning, while the multiple calls we do in d3d involve switching between kernel and user mode which has an overhead, although this should be negligible as hardware evolves. It's certainly an advantage to have control over the current scanline, I think we'll manage to use that information for something interesting.

Quote

This has prompted me to do a new test - add the d3d9ex code (I think I'll add this permanently to the ASIO patch to make it W7 only, ASIO seems to be pretty crappy in XP anyway), remove the m_speed hack currently in the 0.163 build, and do a test with the new frame_delay code (with/without -mt).

With regards to LCD monitors, it is nice to have the d3d9ex mode available, since fd causes static tearing with those. It's actually not a matter of the monitor being LCD, it's the scaling involved and the time it takes (if it's longer than the retrace) what causes the tearing. So this also happens on CRT monitors at "high" resolutions when using a slow card. If the card is fast enough you don't get tearing even at high resolutions, for instance I don't get any tearing at 2560x1600 with an R9 270 and fd. But as soon as I use hlsl, I get lots of tearing, since the time it takes to render a frame increases until it doesn't "fit" in the retrace anymore.

The longer it takes to scale (the slower your card is), the lower the tearing will position on the screen. If the tearing is always by the top of the screen, maybe there's a chance to "hide" it effectively by exiting the first synchronization a bit earlier, let's say 32 lines before v-blank:

Code: [Select]

int last_scanline = m_switchres_mode && m_switchres_mode->vtotal?
							m_switchres_mode->vactive - 32 + (m_switchres_mode->vtotal - m_switchres_mode->vbegin) / (m_switchres_mode->interlace?2:1) :
							m_height - 32;

Obviously the value is system dependent, and may not be consistent. But just for the heck of it. Besides, because we're making the frame time shorter, we also may need to lower the value of fd when using this trick, otherwise we may arrive too late. So this reduces the effectiveness of fd a bit.

I'm not sure if it's good to remove the m_speed thing however. For the case of typical LCD monitors (again), when you're forced to run everything at 60 Hz, this will cause sound problems I believe.

big10p · « **Reply #220 on:** July 08, 2015, 07:51:24 am »

Thanks for this version of GM, intealls! I've only just gotten round to trying it and am very impressed. The sound lag in certain games - like Pac-Man - was bugging me but ASIO GM seems to have pretty much nailed the problem. Thanks again!

filimpan · « **Reply #221 on:** July 08, 2015, 11:45:18 am »

Quote from: intealls on July 07, 2015, 11:20:51 pm

Quote from: filimpan on July 07, 2015, 06:07:07 pm
I don't know if 'rate has converged after x seconds' does pop up. A line pops up, but it's there for about 1 frame before cmd closes. I assume that's what it says. So I put the 47998.15 into mame.ini and got slightly better results . I used to get about 2 overruns and 12 underruns in 180 seconds with 288 samples. This time I got 0/10. I also tried 192 and 96 samples, and got 1/10 in 190 seconds, and 2/13 in 300 seconds respectively. Overall very similar results, but definitely better than before. I look forward to the next version of GM ASIO

Ah, I understand the problem. I should add a 'press enter to exit' to the tool. Thanks for reporting this.

Also, I haven't really tested the ASIO stuff with the setup you're using (120 Hz etc), so please report issues if you encounter them.

I'll be sure to let you know if I find anything janky.

I also want to add my thanks for your ASIO implementation. Low latency and highly configurable audio is honestly one of the most important features of emulation for me, and it's a feature of all my favourite emulators. It was one aspect of MAME which I always felt was sorely lacking. I have a game which exists on PS2 and CPS3, and while the PS2 version is technically inferior and has more input lag, my PS2 emulator at least has WASAPI support and therefore had noticeably less audio lag on this game until GM ASIO. As much as I like low input lag, I think it means little without low audio lag.

Dr.Venom · « **Reply #222 on:** July 09, 2015, 07:47:04 am »

Quote from: vicosku on July 07, 2015, 12:26:22 pm

Quote from: Dr.Venom on July 07, 2015, 12:01:19 pm
Slightly off-topic, but if I ever wanted to buy a high-speed camera for doing the latency test with a led wired to the joystick, is there one which you would recommend? Or maybe there are things to keep in mind when looking for such a camera?

Personally, I just use my work phone's slow motion mode. The iPhone 6 records at 240FPS, and I was using a 5S at 120FPS last year. I got a cheap mini-tripod as well to make recording easier. I'd suspect this is a fairly common feature on phones and cameras these days, but I haven't looked into it since I already had a solution in-hand.

Thanks, that sounds like a great option. I've been considering upgrading my iPhone for some time now, and this slow motion mode may be the final push

. The mini-tripod looks like a great addition as well!

Quote from: intealls on July 07, 2015, 01:31:16 pm

Here's a build with the latest changes!

https://mega.nz/#!DtQUkDwL!kZXlokfdyvTZUvFtOD4uNQZMGWfJFph-i4Ee_DvQ3xs

I've done the tests with the genesis driver. The positive thing is that the results for d3d are marginally better with this latest release, but unfortunately it still doesn't come close to ddraw. By and large the results I posted for the previous build still hold (see here).

To sort of summarize, I've attached three pictures. In the first two the marginal improvement for d3d versus previous release d3d can be seen. The last picture shows the more stable results for ddraw.

Quote from: intealls on July 08, 2015, 01:09:37 am

And here are my results (all without multithreading)! Be sure to look at the scale of the speed percentage, Octave wasn't being consistent with this.

Intealls, could you possibly also test the genesis driver d3d versus ddraw on your side (using -resolution 1280x0)? I'm curious whether you find the same differences. Given your other results (ddraw only marginally better than d3d) I'm wondering whether the genesis driver is somehow the odd one out?

With regards to the results you posted for gunforc2, pbobble and the others. Is it a coincidence that they are all running at 99.99%? I'm a bit surprised by this, given that they are running different modelines. Because of this, I would expect a bit more random pattern of values above and below 100% that the emulation is being forced to. It could still be coincidence of course.

I also found another testcase with outrun. With this game it keeps increasing the samples in the buffer at a steady pace before falling back (same for d3d and ddraw). See the attached picture!

vicosku · « **Reply #223 on:** July 09, 2015, 10:19:12 am »

Quote from: Dr.Venom on July 09, 2015, 07:47:04 am

Thanks, that sounds like a great option. I've been considering upgrading my iPhone for some time now, and this slow motion mode may be the final push .

Depending on which model you have, you might be able to get at least 60FPS through a 3rd-party app. I did this with the iPhone 5.

Calamity · « **Reply #224 on:** July 09, 2015, 11:09:04 am »

Quote from: Dr.Venom on July 09, 2015, 07:47:04 am

I've done the tests with the genesis driver. The positive thing is that the results for d3d are marginally better with this latest release, but unfortunately it still doesn't come close to ddraw.

That's far from being a positive thing

I'm totally missing something here. If you didn't have the ddraw results I'll be looking for something between the video and audio updates that took a slightly different amount of time for each frame as the possible cause. But seeing that ddraw keeps everything so stable opposite to d3d definitely points to the video api, and that's something I really don't understand, since the getrasterstatus call should force an the exact amount of time per frame just like ddraw. So the only thing I can think of right now is a combination of the gpu and video drivers. The fact is that my testing has been done with two cards: HD 6450 and R9 270 and for both of them I got the scanlines to be reported consistenly per frame. I didn't test any HD 4xxx + CRT Emudriver (my testing rig uses newer hardware now as I'm doing other "experiments"). It'd be good to know what card intealls is testing with. It's important to enable the scanline log per frame at least temporarily (send it to a file), and check whether "vblank_end" exists consistenly at the same scanline number (it'll take a few frames to stabilize). If it does (+-1 line is ok) then definitely I don't understand what's going on.

intealls · « **Reply #225 on:** July 09, 2015, 11:41:10 am »

Quote from: Calamity on July 09, 2015, 11:09:04 am

Quote from: Dr.Venom on July 09, 2015, 07:47:04 am
I've done the tests with the genesis driver. The positive thing is that the results for d3d are marginally better with this latest release, but unfortunately it still doesn't come close to ddraw.

That's far from being a positive thing

I'm totally missing something here. If you didn't have the ddraw results I'll be looking for something between the video and audio updates that took a slightly different amount of time for each frame as the possible cause. But seeing that ddraw keeps everything so stable opposite to d3d definitely points to the video api, and that's something I really don't understand, since the getrasterstatus call should force an the exact amount of time per frame just like ddraw. So the only thing I can think of right now is a combination of the gpu and video drivers. The fact is that my testing has been done with two cards: HD 6450 and R9 270 and for both of them I got the scanlines to be reported consistenly per frame. I didn't test any HD 4xxx + CRT Emudriver (my testing rig uses newer hardware now as I'm doing other "experiments"). It'd be good to know what card intealls is testing with. It's important to enable the scanline log per frame at least temporarily (send it to a file), and check whether "vblank_end" exists consistenly at the same scanline number (it'll take a few frames to stabilize). If it does (+-1 line is ok) then definitely I don't understand what's going on.

I'll test/add this, but on a whole, but I don't think the speed_percent stability is going to be a problem. Even if it's not as stable as ddraw, I think it's definitely good enough. Also, I'm currently testing doing audio updates from video_manager::frame_update (instead of the default periodic timer updates), which might even make the speed_percent metric redundant, since I could probably deduce the dot-clock granulated frame rate from this instead (perhaps not as quickly though). Though it would be nice to have access the metric if it's already computed somewhere. Also, the sound output modules are not given access to speed_percent per default anymore since somewhere around 0.159/0.160.

The card used is a RV730 (4650/4670?).

Quote from: Dr.Venom on July 09, 2015, 07:47:04 am

Quote from: intealls on July 08, 2015, 01:09:37 am
And here are my results (all without multithreading)! Be sure to look at the scale of the speed percentage, Octave wasn't being consistent with this.

Intealls, could you possibly also test the genesis driver d3d versus ddraw on your side (using -resolution 1280x0)? I'm curious whether you find the same differences. Given your other results (ddraw only marginally better than d3d) I'm wondering whether the genesis driver is somehow the odd one out?

With regards to the results you posted for gunforc2, pbobble and the others. Is it a coincidence that they are all running at 99.99%? I'm a bit surprised by this, given that they are running different modelines. Because of this, I would expect a bit more random pattern of values above and below 100% that the emulation is being forced to. It could still be coincidence of course.

Thanks again for your tests! I think it's accurate, but I'll measure the vsync lead with the games tested to make sure. I'll also test the genesis driver with 1280x0.

Quote from: Dr.Venom on July 09, 2015, 07:47:04 am

I also found another testcase with outrun. With this game it keeps increasing the samples in the buffer at a steady pace before falling back (same for d3d and ddraw). See the attached picture!

I'm testing a fix for this now. Normally a timer is set for audio updates (50 times/second, ASIO changed this to first 125 then 60), I've disabled this and do the audio update after the end-of-frame update. This makes buffer handling a whole lot easier, however I do need to make sure that this has no negative side effects. Also, I think now is the time to get a better understanding of how the sound system works, since I think there's intermediary buffering going on, which may or may not be driver-dependent. I find it difficult to believe that the actual pbobble hardware would buffer as many samples as shown in the topmost plot in this post http://forum.arcadecontrols.com/index.php/topic,141869.msg1469093.html#msg1469093, however actual hardware tests would need to be conducted. I've got a NeoGeo sitting here that I haven't hooked up yet, to use for comparison (though not with pbobble). Anyway, I redid the tests yesterday, and frame_delay does chop off a bit allowing the latency to go to about ~60 ms (leaving the actual latency produced to something in the vicinity of ~55 ms, since a couple of milliseconds can be attributed to USB input lag, plot two posts down).

Also, I managed to get kX to accept a 48 buffer size without dropouts by changing the sync method to Kernel/SMP. Below is an Outrun (hardihar) with fd6 (new fd code) and -mt with the 48 buffer size setting. What can be seen is that the buffer is stably running at about 120 to 160 samples. This means a worst case latency of (160 + 48)/48 = ~4.3 ms, if no intermediary buffering is going on (which I think is). So the ASIO functionality itself is not producing any noteworthy latency additions in this example. Also speed_percent appears to be swinging a bit (again, mind the scale), but as can be seen this doesn't affect the audio performance at all. There's currently a backoff in place which only accepts a new speed_percent value if it's being fairly consistently reported for about half a second.

Calamity · « **Reply #226 on:** July 09, 2015, 01:16:25 pm »

Quote from: intealls on July 09, 2015, 11:41:10 am

Also, I'm currently testing doing audio updates from video_manager::frame_update (instead of the default periodic timer updates)

Normally a timer is set for audio updates (50 times/second, ASIO changed this to first 125 then 60), I've disabled this and do the audio update after the end-of-frame update.

Brilliant. I was struggling to understand at what point in baseline code the audio update is performed, I wasn't aware you've modified it in you build to put it exactly in the right point (IMHO).

Quote

This makes buffer handling a whole lot easier, however I do need to make sure that this has no negative side effects. Also, I think now is the time to get a better understanding of how the sound system works

Definitely, that's true for me, I should dedicate some time to understand the audio part as I'm totally clueless about it and how it relates to the video.

One wants to believe that MAMEdevs had a good reason to make audio update asynchronous with regards to video, maybe some hardware requires it to be like that (we focus almost exclusively in 80-90's raster video games, but what about vectors, etc).

But let's don't forget there's always a remote possibility that it wasn't such a good idea and it only made sense from a software engineering point of view.

Quote

Also speed_percent appears to be swinging a bit (again, mind the scale), but as can be seen this doesn't affect the audio performance at all. There's currently a backoff in place which only accepts a new speed_percent value if it's being fairly consistently reported for about half a second.

Maybe the m_speed patch could help with the swinging? (anyway the precision on m_speed would be only 0.999 so that variation wouldn't be registered)

Another approach is what they do in RetroArch. Instead of looking at the speed, they check the amount of remaining samples in the buffer, and modify the sample frequency based on that.

intealls · « **Reply #227 on:** July 09, 2015, 01:52:15 pm »

Quote from: Calamity on July 09, 2015, 11:09:04 am

Quote from: Dr.Venom on July 09, 2015, 07:47:04 am
I've done the tests with the genesis driver. The positive thing is that the results for d3d are marginally better with this latest release, but unfortunately it still doesn't come close to ddraw.

That's far from being a positive thing

I'm totally missing something here. If you didn't have the ddraw results I'll be looking for something between the video and audio updates that took a slightly different amount of time for each frame as the possible cause. But seeing that ddraw keeps everything so stable opposite to d3d definitely points to the video api, and that's something I really don't understand, since the getrasterstatus call should force an the exact amount of time per frame just like ddraw. So the only thing I can think of right now is a combination of the gpu and video drivers. The fact is that my testing has been done with two cards: HD 6450 and R9 270 and for both of them I got the scanlines to be reported consistenly per frame. I didn't test any HD 4xxx + CRT Emudriver (my testing rig uses newer hardware now as I'm doing other "experiments"). It'd be good to know what card intealls is testing with. It's important to enable the scanline log per frame at least temporarily (send it to a file), and check whether "vblank_end" exists consistenly at the same scanline number (it'll take a few frames to stabilize). If it does (+-1 line is ok) then definitely I don't understand what's going on.

Do you mean vblank_end() in screen or vblank end in the fd new code? vblank end in the new fd code seems to exit very consistently. I moved the printouts out of the loops as per the code posted below, and got the result in the attached log. frame_delay is set to 7 (pbobble).

Also did yet another comparison (though not as lengthy) attached.

Code: [Select]

	// sync to VBLANK
	osd_printf_verbose("\nwait vblank start\n");
	if (window().machine().options().frame_delay() != 0 && ((video_config.triplebuf && window().fullscreen()) || video_config.waitvsync || video_config.syncrefresh))
	{
		D3DRASTER_STATUS raster_status;
		memset (&raster_status, 0, sizeof(D3DRASTER_STATUS));

		while (!raster_status.InVBlank)
		{
			if ((*d3dintf->device.get_raster_status)(m_device, &raster_status) != D3D_OK)
				break;

			if (raster_status.ScanLine >= last_scanline)
				break;
		}
		osd_printf_verbose("current_line: %d\n", raster_status.ScanLine);
	}

	// present the current buffers
	result = (*d3dintf->device.present)(m_device, NULL, NULL, NULL, NULL, 0);
	if (result != D3D_OK) osd_printf_verbose("Direct3D: Error %08X during device present call\n", (int)result);

	// sync to VBLANK
	osd_printf_verbose("wait vblank end\n");
	if (window().machine().options().frame_delay() != 0 && ((video_config.triplebuf && window().fullscreen()) || video_config.waitvsync || video_config.syncrefresh))
	{
		D3DRASTER_STATUS raster_status;
		memset (&raster_status, 0, sizeof(D3DRASTER_STATUS));

		while (raster_status.ScanLine >= last_scanline || raster_status.ScanLine <= first_scanline)
		{
			if ((*d3dintf->device.get_raster_status)(m_device, &raster_status) != D3D_OK)
				break;
		}
		osd_printf_verbose("current_line: %d\n", raster_status.ScanLine);
	}

Calamity · « **Reply #228 on:** July 09, 2015, 02:10:18 pm »

Quote from: intealls on July 09, 2015, 01:52:15 pm

Do you mean vblank_end() in screen or vblank end in the fd new code? vblank end in the new fd code seems to exit very consistently.

Yeah I meant that vblank end. It shows line 28 all the time, which is right. I still don't get why ddraw would do any better here, unless sub-scanline precision matters.

If I get this right, 48000/15750 = 3,04 samples per scanline (is this right?)

So let's say ddraw is able to exit vblank end more accurately than d3d, even so the maximum mismatch would be 3 samples per frame (probably less). Would that mismatch, accumulated after some frames, explain the variability you get?

intealls · « **Reply #229 on:** July 09, 2015, 03:35:25 pm »

Quote from: Dr.Venom on July 09, 2015, 07:47:04 am

To sort of summarize, I've attached three pictures. In the first two the marginal improvement for d3d versus previous release d3d can be seen. The last picture shows the more stable results for ddraw.

This isn't completely fair. The scales on the genesis new-fd/genesis ddraw are different, with the d3d one going from 0.99995 to 1.00015 and ddraw going from ~0.89 to 1.02. Although when comparing genesis new-fd with outrun, ddraw does seem a bit more stable.

Quote from: Calamity on July 09, 2015, 01:16:25 pm

Quote from: intealls on July 09, 2015, 11:41:10 am
Also, I'm currently testing doing audio updates from video_manager::frame_update (instead of the default periodic timer updates)

Normally a timer is set for audio updates (50 times/second, ASIO changed this to first 125 then 60), I've disabled this and do the audio update after the end-of-frame update.

Brilliant. I was struggling to understand at what point in baseline code the audio update is performed, I wasn't aware you've modified it in you build to put it exactly in the right point (IMHO).

Thanks

After the previous discussion a while back about emulation audio latency it certainly seemed to make sense putting it there.

Quote from: Calamity on July 09, 2015, 01:16:25 pm

Quote
This makes buffer handling a whole lot easier, however I do need to make sure that this has no negative side effects. Also, I think now is the time to get a better understanding of how the sound system works

Definitely, that's true for me, I should dedicate some time to understand the audio part as I'm totally clueless about it and how it relates to the video.

One wants to believe that MAMEdevs had a good reason to make audio update asynchronous with regards to video, maybe some hardware requires it to be like that (we focus almost exclusively in 80-90's raster video games, but what about vectors, etc).

But let's don't forget there's always a remote possibility that it wasn't such a good idea and it only made sense from a software engineering point of view.

Absolutely, hopefully it won't affect anything negatively. But there are a lot of drivers just like you say, and we can't test them all.

Quote from: Calamity on July 09, 2015, 01:16:25 pm

Quote
Also speed_percent appears to be swinging a bit (again, mind the scale), but as can be seen this doesn't affect the audio performance at all. There's currently a backoff in place which only accepts a new speed_percent value if it's being fairly consistently reported for about half a second.

Maybe the m_speed patch could help with the swinging? (anyway the precision on m_speed would be only 0.999 so that variation wouldn't be registered)

Another approach is what they do in RetroArch. Instead of looking at the speed, they check the amount of remaining samples in the buffer, and modify the sample frequency based on that.

I've been testing linear fitting of (incoming minus outgoing) samples (which is probably overkill, but simple enough to do), which would roll the speed_percent and the calibration frequency into one metric. I need to check out how they do it in RetroArch, but I suspect low latency audio will put tighter demands on the adjustment routine (as it does with everything else

).

Quote from: Calamity on July 09, 2015, 02:10:18 pm

Quote from: intealls on July 09, 2015, 01:52:15 pm
Do you mean vblank_end() in screen or vblank end in the fd new code? vblank end in the new fd code seems to exit very consistently.

Yeah I meant that vblank end. It shows line 28 all the time, which is right. I still don't get why ddraw would do any better here, unless sub-scanline precision matters.

If I get this right, 48000/15750 = 3,04 samples per scanline (is this right?)

So let's say ddraw is able to exit vblank end more accurately than d3d, even so the maximum mismatch would be 3 samples per frame (probably less). Would that mismatch, accumulated after some frames, explain the variability you get?

I think 48000 * (dot-clock granulated refresh / game refresh) / 15750 is the amount of samples. So (48000 * speed_percent) / 15750. So something very close to 3.

If you mean the variability in the above plots, I think that can be explained. Right now, I believe that the (very small) variations in buffer size in the above plots show a phase discrepancy between the BASSASIO callback and the audio update routine. I've attached a zoomed in plot. One can see that the maximum variation appears to be somewhere around 48 (note the first time it goes to 160, then down to ~115).

The main thing causing problems is the intermittent sample delay generation, but I haven't seen this in a while, and not with the m_speed hack + new fd code. I've also noticed that changing m_speed too often will cause some problems but I really need to investigate this further, especially with the new additions (new fd code + audio update at end of frame). I seem to be able to reproduce this with the old fd code, but I haven't seen it with the new. It's the problem outlined in this http://forum.arcadecontrols.com/index.php/topic,142143.msg1518722.html#msg1518722 post.

Edit: I threw some statistics onto my measurements (gunforc2, pbobble etc) and came up with the result below

Code: [Select]

file: u:\vicosku_new\new_fd_code\2\gunforc2_fd6_d3d_7.log
std dev: 0.0000061
mean: 0.9989841
max: 0.9990220
min: 0.9989440
max - min: 0.0000780
file: u:\vicosku_new\new_fd_code\2\gunforc2_fd6_ddraw_7.log
std dev: 0.0000046
mean: 0.9989841
max: 0.9990140
min: 0.9989570
max - min: 0.0000570
file: u:\vicosku_new\new_fd_code\2\neodrift_fd6_d3d_7.log
std dev: 0.0000061
mean: 0.9996173
max: 0.9996590
min: 0.9995700
max - min: 0.0000890
file: u:\vicosku_new\new_fd_code\2\neodrift_fd6_ddraw_7.log
std dev: 0.0000046
mean: 0.9996173
max: 0.9996370
min: 0.9995980
max - min: 0.0000390
file: u:\vicosku_new\new_fd_code\2\pbobble_fd7_d3d_7.log
std dev: 0.0000066
mean: 0.9989841
max: 0.9990190
min: 0.9989510
max - min: 0.0000680
file: u:\vicosku_new\new_fd_code\2\pbobble_fd7_ddraw_7.log
std dev: 0.0000050
mean: 0.9989841
max: 0.9990160
min: 0.9989540
max - min: 0.0000620
file: u:\vicosku_new\new_fd_code\2\sf2hf_fd7_d3d_7.log
std dev: 0.0000059
mean: 0.9997785
max: 0.9998060
min: 0.9997480
max - min: 0.0000580
file: u:\vicosku_new\new_fd_code\2\sf2hf_fd7_ddraw_7.log
std dev: 0.0000045
mean: 0.9997786
max: 0.9998040
min: 0.9997570
max - min: 0.0000470

As one can see, the differences are not very big.

Edit again:

I just did a genesis run (Mega Turrican, super resolution 1280x0 with two "resolution switches" (Data East logo -> title screen -> gameplay)), with the following results:

Code: [Select]

file: u:\vicosku_new\new_fd_code\genesis\genesis_d3d.log
std dev: 0.0000160
mean: 1.0004094
max: 1.0004600
min: 1.0003600
max - min: 0.0001000
file: u:\vicosku_new\new_fd_code\genesis\genesis_ddraw.log
std dev: 0.0000160
mean: 1.0004096
max: 1.0004600
min: 1.0003700
max - min: 0.0000900

They are strikingly similar. Attached two plots as well, sorry for the crappy handwriting but snipping tool doesn't offer proper text. Also they were done with frame_delay 5.

Edit again again:

Did a 10 minute run of Mega Turrican to find out if the resolution switches would cause any issues: and no. No underruns noted.

Code: [Select]

file: u:\vicosku_new\asio.log (Mega Turrican 10 min run)
std dev: 0.0000161
mean: 1.0001045
max: 1.0001900
min: 1.0000200
max - min: 0.0001700

Edit just one more time:

I've uploaded the build used for these tests to here https://mega.nz/#!jsREGTKT!v0lQ2TUziRnFrOpzEn5VuYo3i5G8ShFKnXNjlTG7i1w , but it's really, really broken and should only be used to check my statements above. Also, with this particular build, asio_holdoff works the other way around, and should be set to 672 with a latency setting of 48. Also I've attached the Octave script used for calculating mean/max etc.

Calamity · « **Reply #230 on:** July 09, 2015, 06:12:07 pm »

Quote from: intealls on July 09, 2015, 03:35:25 pm

Right now, I believe that the (very small) variations in buffer size in the above plots show a phase discrepancy between the BASSASIO callback and the audio update routine.

Alright I see what you mean, makes total sense.

I'm eager to see if Dr.Venom can replicate your results with the genesis driver.

Dr.Venom · « **Reply #231 on:** July 09, 2015, 06:20:37 pm »

Quote from: intealls on July 09, 2015, 03:35:25 pm

This isn't completely fair. The scales on the genesis new-fd/genesis ddraw are different, with the d3d one going from 0.99995 to 1.00015 and ddraw going from ~0.89 to 1.02. Although when comparing genesis new-fd with outrun, ddraw does seem a bit more stable.

You're absolutely right. I only realized just now how deceptive the scaling on the chart actually is. What I did was re-run both tests, and then cut off the first 32 readings from the log (which causes the spike from 0.89 to 1.02). I've attached both charts, which shows that they are much more similar now. Looking at the sample buffer stability, possibly even in favour of d3d!

Edit: I've just noticed your edited post with additional results for the genesis driver. I'm glad we got this part sorted!

That only leaves us with the "lowest possible latency test" comparison (as I reported earlier).

When using ddraw with the genesis driver I can get as low as 48 with a holdoff of 528, with zero over-/underruns. When running this setting with d3d, the emulator either produces a continuous stream of underruns (small sample quoted below) and/or crashes. This remains until raising the holdoff to 816. In other words a 6ms "lowest possible audio latency" difference between ddraw and d3d.

I'm not sure how this test will hold up for other drivers though. It's just that if ddraw allows me to go to a holdoff of 528 for the genesis driver, I want to be able to do that with d3d too!

Intealls, could you possibly also run this "lowest possible latency test" to compare d3d and ddraw performance with the genesis driver for your system? I'm curious whether you'll get comparable results. It may provide a last missing link?

Code: [Select]

ASIO: Resync underrun at about update 224, speed: 1.000063
ASIO: Reverting head -64 counts, just got 800 samples
ASIO: Resync underrun at about update 225, speed: 1.000063
ASIO: Reverting head -64 counts, just got 800 samples
ASIO: Resync underrun at about update 226, speed: 1.000063
ASIO: Reverting head -64 counts, just got 800 samples
ASIO: Resync underrun at about update 227, speed: 1.000063
ASIO: Reverting head -64 counts, just got 800 samples
ASIO: Resync underrun at about update 228, speed: 1.000063
ASIO: Reverting head -64 counts, just got 800 samples
ASIO: Resync underrun at about update 229, speed: 1.000063
ASIO: Reverting head -64 counts, just got 800 samples
ASIO: Resync underrun at about update 230, speed: 1.000063

Quote from: intealls on July 09, 2015, 03:35:25 pm

Edit just one more time:

I've uploaded the build used for these tests to here https://mega.nz/#!jsREGTKT!v0lQ2TUziRnFrOpzEn5VuYo3i5G8ShFKnXNjlTG7i1w , but it's really, really broken and should only be used to check my statements above. Also, with this particular build, asio_holdoff works the other way around, and should be set to 672 with a latency setting of 48. Also I've attached the Octave script used for calculating mean/max etc.

EDIT: What do you exactly mean with "the other way around". If I have the driver set to a latency of 48 samples, can I still set the asio_holdoff to 528 or other multiple, or is that wrong for this build?

Calamity · « **Reply #232 on:** July 09, 2015, 06:52:09 pm »

Dr.Venom, what fd value are you using? I believe that with the new fd implementation it'll be benefitial to increase fd as much as possible. The wait for vblank is a tight loop that'll eat all cpu cycles so the shorter the time we are in it the more idle cpu cycles there'll be for other stuff. Just in case.

intealls · « **Reply #233 on:** July 09, 2015, 07:10:01 pm »

Quote from: Dr.Venom on July 09, 2015, 06:20:37 pm

EDIT: What do you exactly mean with "the other way around". If I have the driver set to a latency of 48 samples, can I still set the asio_holdoff to 528 or other multiple, or is that wrong for this build?

Sorry about this. It'll vary between drivers a bit but the general rule is basically 800 - asio_holdoff = the amount of samples you want consistently in the buffer. So a setting of 672 means that we try to keep 128 samples in the buffer.

Edit: So a proper setting for a latency of 48 should (with this wonky build) be 800 - 48 * 4 = 608. Which means that we can allow for four missed BASSASIO fetches basically.

intealls · « **Reply #234 on:** July 09, 2015, 07:21:12 pm »

Quote from: Dr.Venom on July 09, 2015, 06:20:37 pm

That only leaves us with the "lowest possible latency test" comparison (as I reported earlier).

When using ddraw with the genesis driver I can get as low as 48 with a holdoff of 528, with zero over-/underruns. When running this setting with d3d, the emulator either produces a continuous stream of underruns (small sample quoted below) and/or crashes. This remains until raising the holdoff to 816. In other words a 6ms "lowest possible audio latency" difference between ddraw and d3d.

I'm not sure how this test will hold up for other drivers though. It's just that if ddraw allows me to go to a holdoff of 528 for the genesis driver, I want to be able to do that with d3d too!

Intealls, could you possibly also run this "lowest possible latency test" to compare d3d and ddraw performance with the genesis driver for your system? I'm curious whether you'll get comparable results. It may provide a last missing link?

Absolutely, and thank you for doing the tests and reporting them, like I said previously I would probably never have attempted it otherwise.

And regarding the lowest possible latency test, I already did it.

The above genesis runs were with the new fd code + d3d/ddraw, and no over/underruns in any of them. The plot in the 10 minute run shows that the buffer consistently contains about 90 to 140 samples, meaning virtually no latency added from ASIO's side of things.

vicosku · « **Reply #235 on:** July 10, 2015, 12:46:46 pm »

Quote from: intealls on July 09, 2015, 11:41:10 am

Also, I managed to get kX to accept a 48 buffer size without dropouts by changing the sync method to Kernel/SMP.

Thanks for this tip! I can also achieve even lower latencies with this setting. Unfortunately, it does seem pretty CPU intensive, like you mentioned. CVS1K games like futari15 drop to 90% speed or lower on my 4.6ghz G3258.

Also, are any of you using a lightweight front-end that works well with the ASIO patch? Now that the timings are getting so tight, GameEx is causing some instability that doesn't exhibit itself when Mame is used alone. Using higher latencies helps.

big10p · « **Reply #236 on:** July 10, 2015, 01:05:59 pm »

I use MaLa, which seems pretty light on resources. It doesn't have all the bells & whistles of FEs like GameEx and HyperSpin, though.

intealls · « **Reply #237 on:** July 10, 2015, 07:18:13 pm »

Quote from: Calamity on July 09, 2015, 06:52:09 pm

Dr.Venom, what fd value are you using? I believe that with the new fd implementation it'll be benefitial to increase fd as much as possible. The wait for vblank is a tight loop that'll eat all cpu cycles so the shorter the time we are in it the more idle cpu cycles there'll be for other stuff. Just in case.

After reading this, I decided to try properly with -mt. Currently, the ASIO patch contains d3d9ex (not sure this affects audio in any way though), the m_speed hack (consistent speed for 1 second before setting m_speed) the new fd code, and the audio generation at the end of the frame update.

I cannot see any adverse effects with -mt and the current combination of patches, at least not for ASIO. I double checked the input response (yes we get next frame response) and sound output delay (doesn't seem to be affected in any way). The speed percentage is a tiny, tiny bit more unstable with using -mt, but my current tests show it has no actual effect whatsoever.

Currently I try to keep about 128 samples in the buffer. An underrun occurs when the buffer is empty. I did 20 minute runs of the following games: fd6: gunforc2, mk2, fd7: neodrift, pbobble, sf2hf, sfa3. The worst case scenario was 1 underrun. I did a couple of 20 min runs with fd8, which all seemed to favor the new d3d code over ddraw as per the below info. This is truly awesome.

Code: [Select]

game     d3d (o/u) ddraw (o/u)
------------------------------
pbobble  0/5       0/20
sf2hf    0/5       1/27
sfa3     3/29      1/61
neodrift 1/46      1/117

Quote from: Dr.Venom on July 09, 2015, 06:20:37 pm

Looking at the sample buffer stability, possibly even in favour of d3d!

I agree!

Dr.Venom · « **Reply #238 on:** July 12, 2015, 10:14:30 am »

Quote from: Calamity on July 09, 2015, 06:52:09 pm

Dr.Venom, what fd value are you using? I believe that with the new fd implementation it'll be benefitial to increase fd as much as possible. The wait for vblank is a tight loop that'll eat all cpu cycles so the shorter the time we are in it the more idle cpu cycles there'll be for other stuff. Just in case.

I was using fd2, purely for testing purposes, as I was still under the impression that fd negatively affected the asio latency (as it did with the earlier builds). With the latest release I see this is no longer the case, on the contrary even

Quote from: intealls on July 09, 2015, 03:35:25 pm

Edit just one more time:

I've uploaded the build used for these tests to here https://mega.nz/#!jsREGTKT!v0lQ2TUziRnFrOpzEn5VuYo3i5G8ShFKnXNjlTG7i1w , but it's really, really broken and should only be used to check my statements above. Also, with this particular build, asio_holdoff works the other way around, and should be set to 672 with a latency setting of 48. Also I've attached the Octave script used for calculating mean/max etc.

Thanks for posting all of your test results. After testing this release, I can say one thing: truely awesome performance indeed! The Genesis driver now runs with fd8 (!) and a ~96 samples buffer (!!) consistently without any issues in D3D! I've attached a picture with the octave profile*, see: genesis-d3d_fd8-2x48-mt.png

* note that I'm deleting the very first unstable values from the log, to get the scale as small as possible for these comparison purposes.

Even more interesting is that at these settings, when it comes to the "screenswitching" done by Mega Turrican (using super resolution 1280x0), d3d is clearly outperforming ddraw. Apparently ddraw is not as quick doing these x4 and x5 width switches as d3d, resulting in a far less stable soundbuffer at these switching moments. You can see this by comparing the d3d picture with the large spikes at the switching moments in the audio buffer for ddraw, as shown in genesis-ddraw_fd8-2x48-mt.png

Note that I use a trick here to get the game to switch screenmode more often. If you'd like to replicate that:

when the intro runs press fire -> title screen (switch)-> wait for about 10 seconds -> level 1 starts to demo (switch)
-> press fire -> title screen (switch)-> wait for about 10 seconds -> level 2 starts to demo (switch)
-> rinse and repeat, and you'll get level 3 and level 4 demo'd also (enjoy the great Chris Huelsbeck level 3 tune!).

So within a test of 2 minutes you can get it to generate a number of screenswitches. These are exactly the big spikes you're seeing in the ddraw graph genesis-ddraw_fd8-2x48-mt.png. To make this even more convincing just compare it to the graph genesis-ddraw_fd8-2x48-mt_v2.png, where I let the whole intro run (so you only have one screenswitch in the beginning). This also shows in the graph, as there's only one spike in the audio buffer in the beginning, but for the rest it's smooth.

Verdict: d3d clearly outperforms ddraw in this test. Add these to your extensive set of d3d/ddraw comparisons, and (at least to me) it seems clear that D3D is the winner with this latest build. I think Calamity will be happy now

I've also attached a run of 1944 using framedelay 8 and the same insanely small soundbuffer. It runs completely without issues, see 1944-d3d_fd8-2x48-mt.png

With regards to using multithreading or not, I can confirm your test results, that there doesn't seem to be any adverse effect from using it. At least when it comes to asio stability. My preference now is to actually keep it on by default. Calamity I guess you may not need to change the multithreading implementation given these test results?

Quote from: intealls on July 09, 2015, 11:41:10 am

Also, I think now is the time to get a better understanding of how the sound system works, since I think there's intermediary buffering going on, which may or may not be driver-dependent. I find it difficult to believe that the actual pbobble hardware would buffer as many samples as shown in the topmost plot in this post http://forum.arcadecontrols.com/index.php/topic,141869.msg1469093.html#msg1469093, however actual hardware tests would need to be conducted. I've got a NeoGeo sitting here that I haven't hooked up yet, to use for comparison (though not with pbobble).

I would love to see the real hardware compared to the current implementation and see where we really are in the total latency chain now. Hopefully you come around do doing these tests.

Lastly, could you possibly explain how the current ASIO implementation keeps the audio buffer stable? Does this affect sound quality in any way?

intealls · « **Reply #239 on:** July 12, 2015, 03:40:50 pm »

Quote from: Dr.Venom on July 12, 2015, 10:14:30 am

Thanks for posting all of your test results. After testing this release, I can say one thing: truely awesome performance indeed! The Genesis driver now runs with fd8 (!) and a ~96 samples buffer (!!) consistently without any issues in D3D! I've attached a picture with the octave profile*, see: genesis-d3d_fd8-2x48-mt.png

* note that I'm deleting the very first unstable values from the log, to get the scale as small as possible for these comparison purposes.

Thanks again for posting! Nice! What CPU are you using? My testing rig is using an i3 4130, and I can only get genesis down reliably to fd6.

Quote from: Dr.Venom on July 12, 2015, 10:14:30 am

Even more interesting is that at these settings, when it comes to the "screenswitching" done by Mega Turrican (using super resolution 1280x0), d3d is clearly outperforming ddraw. Apparently ddraw is not as quick doing these x4 and x5 width switches as d3d, resulting in a far less stable soundbuffer at these switching moments. You can see this by comparing the d3d picture with the large spikes at the switching moments in the audio buffer for ddraw, as shown in genesis-ddraw_fd8-2x48-mt.png

Note that I use a trick here to get the game to switch screenmode more often. If you'd like to replicate that:

when the intro runs press fire -> title screen (switch)-> wait for about 10 seconds -> level 1 starts to demo (switch)
-> press fire -> title screen (switch)-> wait for about 10 seconds -> level 2 starts to demo (switch)
-> rinse and repeat, and you'll get level 3 and level 4 demo'd also (enjoy the great Chris Huelsbeck level 3 tune!).

Nice trick.

This is a fabulous game btw. Used to play 1 and 2 on the Amiga.

Quote from: Dr.Venom on July 12, 2015, 10:14:30 am

With regards to using multithreading or not, I can confirm your test results, that there doesn't seem to be any adverse effect from using it. At least when it comes to asio stability. My preference now is to actually keep it on by default. Calamity I guess you may not need to change the multithreading implementation given these test results?

Mine too! The only thing I've noticed is that if using -window, the emulator seems to lock up when moving it around.

Quote from: Dr.Venom on July 12, 2015, 10:14:30 am

Lastly, could you possibly explain how the current ASIO implementation keeps the audio buffer stable? Does this affect sound quality in any way?

Currently, I try to do as little as possible. As you can see in your tests, if the game speed is consistent (small deviations are ok), it just works. For the next release, I've rearranged the holdoff stuff to provide an audio_latency setting. It's pretty much based on the discussion we've had here - it seemed sensible to base the audio latency upon how many missed callback fetches to allow basically. When set to 1, it attempts to do pretty much what you do in your tests - namely to try and keep the buffer count impossibly low. What it does is that the audio is not being started until the second audio update has been received, and skips almost all samples from the first one. Let's say we get 800 samples per update (60 Hz/48 kHz). When two updates have been received, there are 1600 samples in the buffer. Assuming a latency of 64, we discard 800 - 64 * 2 = 672 samples, being left with 928 in the buffer. We start playback. The next time an audio update arrives, we _should_ have about 128 samples left in the buffer as backup to allow for some leniency (it will vary a bit probably due to update/callback phase shift, might try to get to the bottom of this, not a priority now). The gist is, when we get a refill of samples, we should have eaten away almost all of the ones we have, and as you can see from your tests, that's usually what happens!

Also, when we have more than asio_latency * 3 samples in the buffer, it will count as an overrun, and samples will be discarded to get the count down.

Also, -audio_latency 1 will _not_ try to avoid underruns. If an underrun occurs, everything will have to stabilize itself, and this usually happens pretty quickly. I just played Ristar for an hour and a half with -audio_latency 1 and an apparent 52 underruns, and didn't notice any one of them.

The underrun behavior has also been changed a bit, when one occurs, instead of giving BASSASIO silence, the last bunch of samples are given again. I think this sounds better, even though when many, many underruns occur, the sound will be sort of metallic, and clearly audible. But still better than silence IMO. Might add an option to toggle this.

When a higher audio latency setting is used, more samples are kept from the get-go, say we discard 800 - 64 * 3 for audio_latency 2. The buffer will also rewind a bit more when getting underruns, to try and prevent them from happening again.

Edit: some clarifications


Main	Restorations	Software	Audio/Jukebox/MP3	Everything Else	Buy/Sell/Trade
Project Announcements	Monitor/Video	GroovyMAME	Merit/JVL Touchscreen	Meet Up	Retail Vendors
Driving & Racing	Woodworking	Software Support Forums	Consoles	Project Arcade	Reviews
Automated Projects	Artwork	Frontend Support Forums	Pinball	Forum Discussion	Old Boards
Raspberry Pi & Dev Board	controls.dat	Linux	Miscellaneous Arcade	Wiki Discussion	Old Archives
Lightguns	Arcade1Up	Try the site in https mode		Site News


Unread posts \| New Replies \| Recent posts \| Rules \| Chatroom \| Wiki \| File Repository \| RSS \| Submit news