5V is standard "TTL" level, which makes it convenient to drive straight from the outputs of the (then common, though not so much anymore) 5V logic that generates all the timing for a video controller. These timing signals really are just digital signals, and many monitors immediately feed them straight into other digital logic, especially on modern "digital" monitors. Keeping them at 5V high levels removes the need for a bunch of unnecessary (read: $$$) level shifting.
The choice of 0.7Vpp for the video signals is an interesting one. There are several legacy reasons as well as technical ones. The legacy concerns probably stem from the technical ones. The biggest reason I can give is that if you AC couple 0.7Vpp analog video (i.e. run it serially through a capacitor to prevent a path for DC bias), which is a commonly required operation, the circuit to restore the DC levels is very simple if you use 0.7V as the peak level since 0.7V is what a silicon diode tends to develop when it is "good and on". Conventionally, signals with embedded sync (e.g. sync on green or composite NTSC) tend to use -0.3V as their sync level, 0V as blanking, ~0-0.1V (depends on the standard) as black, and 0.7V as full white.
Also, the 0.7V is after the 2:1 divider of source/termination 75ohm impedance. So the actual top rail has to be a little higher than 1.4V. If you wanted 5V signals after the divider, you'd need a 10-12V rail. While computers have these, circuit designers like to avoid using them outside of the power areas for various reasons that aren't really worth delving into here.
Keeping the signals small also helps with high bandwidth (high resolution) signals due to the limited slew rate (wikipedia it) of the analog output buffers. 0.7-1V is a good compromise between practical slew rate limitations ("back in the day") and noise immunity on shielded coaxial cable.
Basically, there are several reasons to keep the levels different and basically no reasons to make them the same, so the choice was made many years ago by VESA to make it the way it is today.
Of note, the old EGA and CGA standards put out by IBM used TTL level video because the computers didn't generate the video using digital to analog conversion. Instead, each R, G, and B line was hooked up the output of a TTL driver and "3-bit" color was used (CGA reduced this to 2bpp and used a pallette to generate the RGB signals) allowing each of Red, Green, and Blue to simply be either on or off. Some other implementations allowed e.g. two different levels through the use of two output drivers per channel and a resistive divider, again driving the divider directly off a TTL output. This kept the PC cheap at the slight cost of sometimes making the montor more expensive.
Modern standards at higher resolutions and color depths require IC DACs and are subject to all the aforementioned stuff.
Sorry about the length, but you asked a technical question and got a technical answer
