there are 2 types - CRT based and sensor based
CRT based use a lens that greatly magnifies the image received by the sensor (the sensor is a kind of opposite LED, it emits electricity when it receives light)
The sensor operates very quickly and returns an x,y type position based on the timing that the gun's sensor is receiving the brightest input, a calculation is done based on timing/refresh rate to determine where you're pointing based on what "time" in the screen refersh cycle is receiving the brightest image.
Things that will make it go wonky are
bad electronics - everything is based on very precise timing (its funny to say this but "it might be the crystals")
glare on the screen that is brighter than the screen stself
dirty screen
dirty lens/no lens in the gun to focus the sensor
essentially anything that would interfere with the sensor not being able to "see" the screen clearly - keep in mind ultraviolet and infrared light are at play here as well, just because you cant see it doesn't mean the sensor doesn't
sensor based games (typically HDTV or SEGA lightgun games) work like WiiMotes -
the gun is a digital camera and there are sensors around the screen. The gun returns an x,y position based on where its pointing in relation to the screen - for an okay example of this ina ction watch :
pay attention to the way the camera in the wii mote "tracks" the sensor bar on his TV screen.