Both of these settings basically use TONS of CPU processor to draw the video screen before sending it to the video card.
You're easily pushing through 2x (actually 4x if you consider 2x+2y = 4x) or 3x (9x) data thru the AGP bus to the video card.
Therefore you're not having the dedicated processor of the video card doing the stretching, you're having the CPU do it all, which takes away power for things like sound, or overloads the bus, net result the same: other things get delayed as video has priority.
Basically lets say you have a 320x200 game you're playing, like Defender. Normally Mame sends this 320x200 to the video card.. That's 64,000 pixels per 1/60th of a second.
The video card takes this and scales it to fit the best resolution.. Usually 640x480 (doubles the width and height of each pixel [4x]) Some might go to 1024x768 and triple the width and height [9x], but "soften" the edges to give a smoother view rather than large square pixel blocks.)
However as of yet video cards can't add in "scanlines" without being fed the data directly (e.g. most can't draw a mask over the picture with a single command, they must be fed each bit).
Therefore when you turn off the hardware stretch and enable scanlines, the computer CPU determines "half brightness" between each and every vertical row's pixel to draw in a pixel under it. Assuming it's having to at least double the size of the screen to 640x400 (though most go to 1024x768), you now are sending 256000 pixels thru every 1/60th of a second.. Thats FOUR TIMES the amount of data going to the bus. Plus the cpu time to calcuate the "halfway" pixel on each scan line row.
That's why it's slow.
Only ways to improve it:
1) Get more CPU horsepower
2) Get a video card that runs at 4x AGP if it doesn't already
3) Ensure your motherboard has 4x AGP bus.
4) Deal with hardware stretching and no scan lines. It gives a soft look on most newer (e.g. GeForce2 or better) cards, almost mimicing a slightly off-focus arcade monitor that has drifted a bit from age.