You’ve seen those little images of scrambled numbers and letters that you are forced to enter before you can do certain things on the internet? These gatekeepers might seem pesky, but, believe it or not, they are actually your very good friends, even if you don’t know it! They are the little helpful nanites of the Borg-o-sphere, only they are not out to assimilate you.
These are called CAPTCHA images. That acronym stands for “Completely Automated Public Turing test to tell Computers and Humans Apart.”
In other words, CAPTCHA images attempt to stop automated processes from spamming functions and web sites that were intended for the use of humans.
As a webmaster I’ve encountered a lot of link spam in my travels. This particular flavor of spam is an attempt to get links visible on web sites for the purpose of convincing search engines like Google that the link is more popular than it really is. This, in turn, is an attempt to “game the system” by getting links to display higher than they normally would based on their “natural” position. Some form responses I’ve seen included more than 200 links! No point in being subtle, I guess.
Basically it’s cheating to get evil motherfuckers something they don’t deserve.
Like a lot of other areas on the internet, the battle for the hill named CAPTCHA is an ongoing game of cat and mouse. The good people invent CAPTCHA images to stop spam, then the bad people respond with automated OCR (Optical Character Recognition) methods to break them. Then the good people respond by tweaking the images so computers can’t read them anymore, then bad people perfect their methods, etc. It goes on and on and on.
That explains why the CAPTCHA images continue to get weirder and harder over time.
There is one very interesting element on this front you may not be aware of. It is a service called reCAPTCHA. They are doing something really creative.
reCAPTCHA offers their services free to web site. In return, by using the service, humans around the world are automatically helping to digitize old books and newspapers (like the New York Times) that contain words that are not scanned well by OCR technology.
The image displays one word that is known to be good and one where help is still needed. When humans get the known word correct, there is a high probability the other word is also correct. These responses are collected by reCAPTCHA and used to fill in the blanks on the scanning of those old books.
reCAPTCHA also includes an audio segment in every reCAPTCHA image, too. This is helpful for the sight-impaired but also helps to digitize old audio programs like radio shows.
It has been estimated that these little reCAPTCHA images collection data from 200 million responses every day. At an estimated 10 seconds each, that represents about 150,000 hours of human productivity every day. That’s about 3,000 man hours of free labor every single day. reCAPTCHA has found a way to harness that power and use it for good.