OTOH, it's possible to get the latency lower on USB than PS/2, amusingly. The reason is that PS/2 has to serially clock bits into the PC at a rather slow rate, then you get a "hard" interrupt. USB interrupts are polled (by hardware) at some fixed rate, but the bit rate used for data transfer is much higher: so much higher that it, if you crank the interrupt endpolling polling rate all the way up (1kHz for FS devices), it can actually take longer to clock the bits across the PS/2 port for some events than it takes to for a worst-case poll delay on USB (which then transfers the entire state of the device over). This gets even worse if you have multiple "simultaneous" events: PS/2 has to send them all in sequence while USB can report them all at once. HS (480Mbps) USB devices can actually do even better: their poll rate can practically be several kHz, and there's plenty of bandwidth to back it up. This tends to be useless for HID devices, though.
Of course, as Randy says, if you absolutely need true FIFO style strict ordering, USB HID cannot guarantee it, though there's no reason you can't use a vendor- or application-specific USB protocol to get it.
A real parallel port in general provides the lowest possible latency. The IRQ on that thing really is wired straight up to the old ISA interrupt controller. Depending on the system, getting a real CPU interrupt out of this may take some time, but it's usually well less than 1ms. You can then read the data lines on the parallel port with very low latency (just enough time to cycle the ISA/LPC bus), though note that reading the status of a Playstation controller takes several cycles of the parallel port and may in fact be slower than having a dedicated microcontroller on e.g. USB handle it - it'll certainly have more CPU overhead.
Now, at this point, we're down to where we need to mention one other thing: the scheduler tick of most modern OSes is, at fastest, 1ms. I think Windows still uses to 10ms as of XP but may be faster on Vista/7/8. HZ on Linux is configurable at kernel compile time and was historically 100 (10ms) but is now usually 1000 (1ms). In general, this means that, in a contended multitasking environment, you cannot expect to get less than 1 or 10ms of latency to a user-space application. Playing some tricks with priority, writing the application properly, and having a very lightly loaded system can sometimes get you into something looking more like async behavior, but it's tough to make that guarantee on any general-purpose (non-real-time) OS like Windows, Linux, BSD, OSX, etc.