I'm not saying your card is not capable, indeed it shouldn't necessarily be a GPU problem. But for me, it is crystal clear that the issue is that for 1 out of 3 frames, this is what happens:
CPU time + GPU time > 16.67 ms
Of course if you use HLSL, then GPU time is bigger, so it's more likely you'll see the problem.
Trust me on this, there's no black magic here.
So why would you see this problem with 0.152 and not before? They change stuff in MAME all the time, maybe now that driver eats more CPU, maybe HLSL now eats more GPU... But I know the stuff I touch for the GM patch and I'm 99.9% sure the problem is not GM related.
PD: BTW, try if enabling multithreading makes any difference.