You’ve seen those little images of scrambled numbers and letters that you are forced to enter before you can do certain things on the internet? These gatekeepers might seem pesky, but, believe it or not, they are actually your very good friends, even if you don’t know it! They are the little helpful nanites of the Borg-o-sphere, only they are not out to assimilate you.
These are called CAPTCHA images. That acronym stands for “Completely Automated Public Turing test to tell Computers and Humans Apart.”
In other words, CAPTCHA images attempt to stop automated processes from spamming functions and web sites that were intended for the use of humans.
As a webmaster I’ve encountered a lot of link spam in my travels. This particular flavor of spam is an attempt to get links visible on web sites for the purpose of convincing search engines like Google that the link is more popular than it really is. This, in turn, is an attempt to “game the system” by getting links to display higher than they normally would based on their “natural” position. Some form responses I’ve seen included more than 200 links! No point in being subtle, I guess.
Basically it’s cheating to get evil motherfuckers something they don’t deserve.
Like a lot of other areas on the internet, the battle for the hill named CAPTCHA is an ongoing game of cat and mouse. The good people invent CAPTCHA images to stop spam, then the bad people respond with automated OCR (Optical Character Recognition) methods to break them. Then the good people respond by tweaking the images so computers can’t read them anymore, then bad people perfect their methods, etc. It goes on and on and on.
That explains why the CAPTCHA images continue to get weirder and harder over time.
There is one very interesting element on this front you may not be aware of. It is a service called reCAPTCHA. They are doing something really creative.
reCAPTCHA offers their services free to web site. In return, by using the service, humans around the world are automatically helping to digitize old books and newspapers (like the New York Times) that contain words that are not scanned well by OCR technology.
The image displays one word that is known to be good and one where help is still needed. When humans get the known word correct, there is a high probability the other word is also correct. These responses are collected by reCAPTCHA and used to fill in the blanks on the scanning of those old books.
reCAPTCHA also includes an audio segment in every reCAPTCHA image, too. This is helpful for the sight-impaired but also helps to digitize old audio programs like radio shows.
It has been estimated that these little reCAPTCHA images collection data from 200 million responses every day. At an estimated 10 seconds each, that represents about 150,000 hours of human productivity every day. That’s about 3,000 man hours of free labor every single day. reCAPTCHA has found a way to harness that power and use it for good.
Wow. I didn’t know all of that! Well, I knew what they were for, but I didn’t think about how much is involved and the bad guys, etc. Thanks, Abyss!
The bad guys now outsource captcha filling to third world nations. You’ve heard of call centers? Well, there are captcha centers as well. Nice huh?
Ok, so maybe it is cheating. But, then again, the thought of Google controlling one’s web destiny is a little hard to take as well. Isn’t Google in the same category as the rest of the evil corporations? Or, do you truly believe their motto of “do no evil”? The amount of tracking they do now is crazy. Basically every ad you’re seeing when surfing the net is purposely crafted based upon your past surfing history and habits.
I mean they’ll post a picture of your backyard on the net for all to see. Or how about Google’s big Buzz blowup where they took peoples email list and turned it into a social networking list without asking the users? It would be like posting your email contact list on Facebook for all to see. Nice!
Will there come a time in the future when the “evil motherfuckers” that are screwing with the “system” that is Google will suddenly become cool Che Guevara types? Personally, I find it kind of satisfying to screw with Google. Of course, as a webmaster it’s a continual battle to fight the spammers as well.
I woke up one day and suddenly had over 3000 comments on one of my sites. They were all, of course, links to porn sites that a bot had posted over night. Google, stupid as they are, had already indexed a boatload of them and a week later de-indexed the site. Ugh. The stupidity of it all makes me laugh now. But back then I really wanted to track down that spammer and do some finger lopping ala Man on Fire. Such are the trials and tribulations of the net.
Whoa hang on!!!
There! I have now spammed your Recent Comments box. I pwn it!!!
Thanks for enlightening me – I’m always sort of annoyed by the CAPTCHA thingy – and on blogspot blogs a lot of times it goofs up – but I can see it has a good reason behind it after reading your post..