Every problem they have except one can be over come by computing power. Of course, that's not accounting for the cost of the computing power that would effect service cost.
However the latency will kill this. No matter how well everything works, if you take than 150ms (And that's generous) between input and visual response, it won't work.
I could see this working perfectly in the lab over LAN but 10-15 hops of internet between service and user, no way. Maybe is ISP's offered it. Cable companies for example have like 3gbit of bandwidth between their service and the customer that's used for all sorts of digital video already.
But, yeah. This is a lovely idea but it'll lag, simple as that.