the red DI boxes are to interface regular "unbalanced" audio from a device like a TV or radio, convert it into a "balanced" signal that is immune to interference... and sends it down the ethernet cable. at the other end, in the IO board, a circuit takes the balanced audio and sums the signals (eliminating the noise) turning it back into unbalanced audio.
there is a "blue" DI box that does the opposite (same as the IO board) converts the balanced signal back into an unbalanced signal that you can feed into a house system or receiver.
it's not terribly complicated. I use a similar DI box setup at home. my computer has a "red" DI box on it, it's patched into a pair of "blue" boxes (daisy chained together) which have their outputs feeding into 2 amps, one for my speakers, and the other for a subwoofer. the red DI box is powered through an adapter from my computer's power supply, so when the computer goes on, the boxes turn on.
since the virtuo amp natively takes a balanced signal, you can just feed the signal from a red DI box (which outputs a balanced signal anyway)
please note that this setup is not the same as the cheapo "RCA balun" using ethernet cable adapters you see all over the amazons, These devices have their uses... but you would likely introduce signal noise and degradation if they are not carefully run, even though they are described as being immune to noise and such. They are not. they simply run the audio signal as it comes out of your device through a small center tapped transformer (producing a + and - signal) and runs that down the cable. problem is, line level audio is often only about 1 volt... so it's not hard to introduce a half volt of noise by simply running near a power wire in your ceiling or wall... basically nuking 50% of your signal.
The touchtunes DI boxes convert the signal into a +11v and -11v signal (hence the need for a 12v power source to run them) so even if that same 0.5v AC ripple noise got in there, the sum circuit takes care of it... and even if it did squeak through... it would be ~4% of the overall signal. or about 0.04v of your 1 volt signal... basically inaudible. it's the best way to transport audio without having noise being introduced.