Wow, 16 months late to respond, that's what work does to you.
Anyway, no, there is no easy-to-follow guide, since it's system dependant. PC based audio systems (and probably many other audio systems) seem to work in a double buffered manner. That is you have two buffers, one that is being played (samples are read from this and output), and one that samples is fed into by the application. When the play buffer is exhausted it will start to read from the other one that has hopefully have been fed appropriately. The size of this buffer dictates the minimum achievable latency (like 3.33 ms). The -pa_latency parameter tries to set up the buffer according to the value provided.
If you want to delve more into details study the nice document cools provided.
When you consider the setup described above, you also realize that there is a driver/chip dependency, for ASIO the variation was extreme, but with WASAPI/WDM-KS there seems to be decent support for 3.33ms, which is also why this figure was recommended since it seems to work with most setups. There might be some Microsoft requirement (or recommendation) somewhere that enables this nice behavior.
Some driver and chip combinations play better with WASAPI, some with WDM-KS, some drivers support very low latencies, some don't. That's why it will always be a bit of trial and error, and making a fool-proof guide to follow is not really possible. And this only considering Windows, not Linux...
The reason you might get higher latencies than what you specify in the configuration are for the reasons above.
Also, regarding the lack of proper documentation, if anyone wants to do it, please do. It's an open source, community driven project, after all.