ShareThis Page

CMU whiz kid on front lines against hackers and spammers

| Wednesday, Aug. 8, 2007

You can't surf the Web anymore without eventually getting CAPTCHA'd.

The wavy, distorted word users must type to gain entry to a protected Web site is called a CAPTCHA. It lets a Web site know whether a user is human or not; there are some visual patterns that computer programs can't read.

CAPTCHAs were designed by Carnegie Mellon University professor Luis von Ahn and his team of computer scientists. Formally known as the Completely Automated Turing Test to Tell Computers and Humans Apart, CAPTCHAs first were used in 2000 by Internet giant Yahoo! to prevent spammers from stealing e-mail addresses.

The "Turing" in the modified acronym is Alan Turing, the English computer science pioneer who did extensive work on artificial intelligence in the 1940s.

Von Ahn, 28, of Shadyside, was bothered by the amount of time people were wasting solving CAPTCHAs. By his estimate, there are 60 million CAPTCHAs solved every day around the world.

"We wanted to see if there was something useful people could do instead," he said.

So to make Web users a little more productive, von Ahn and his team joined the Internet Archive's project to digitize books. The nonprofit Internet Archive was founded in 1996 to build an Internet library. Most of the books are scanned using Optical Character Recognition. But that system's major flaw -- the thing that makes CAPTCHA effective -- is that computer programs can't read some words; older typefaces and books in deteriorating condition pose problems for OCR.

"If we can get people to decipher the text from scanned books, we can digitize a million words a day," von Ahn said. The latest version of CAPTCHAs appear on

Peter Lee, head of the computer science department at CMU's School of Computer Science, said von Ahn and his work have been great assets to the university.

"He has a way of explaining concepts clearly, of making them fun and interesting," Lee said. "He really inspires and motivates his students."

Von Ahn's work is a "green" version of human computation, Lee added. "He doesn't want the power of the human mind to go to waste," he said. "He wants to capture and harness it for good. His work has been really transformative."

Some of the older CAPTCHAs are broken now, but there are newer-generation ones being developed every day, von Ahn said. "And we're about two or three years ahead of the spammers," he said.

Von Ahn said he and his team, Ben Maurer, Colin McMillan and Mike Crawford, argue almost daily about how CAPTCHAs should be used. Their latest disagreement: whether to block a particular IP address from CAPTCHA-solving, because a lot seem to be getting solved by a particular computer.

The danger, von Ahn explained, is that there are companies who hire people to do nothing but solve CAPTCHAs all day.

Additional Information:

Looking ahead

Luis von Ahn was last year's youngest recipient of the MacArthur Fellowship, widely known as the 'Genius Grant.'

Besides CAPTCHA and reCAPTCHA, his projects include Phetch, a game that gets people to describe images from Web pages, in order to make the Internet more accessible for the blind. Once two separate users give the same word to describe an image, they move on to another one. The game then generates descriptions for the images, giving more information to descriptive software used by the blind.

TribLIVE commenting policy

You are solely responsible for your comments and by using you agree to our Terms of Service.

We moderate comments. Our goal is to provide substantive commentary for a general readership. By screening submissions, we provide a space where readers can share intelligent and informed commentary that enhances the quality of our news and information.

While most comments will be posted if they are on-topic and not abusive, moderating decisions are subjective. We will make them as carefully and consistently as we can. Because of the volume of reader comments, we cannot review individual moderation decisions with readers.

We value thoughtful comments representing a range of views that make their point quickly and politely. We make an effort to protect discussions from repeated comments either by the same reader or different readers

We follow the same standards for taste as the daily newspaper. A few things we won't tolerate: personal attacks, obscenity, vulgarity, profanity (including expletives and letters followed by dashes), commercial promotion, impersonations, incoherence, proselytizing and SHOUTING. Don't include URLs to Web sites.

We do not edit comments. They are either approved or deleted. We reserve the right to edit a comment that is quoted or excerpted in an article. In this case, we may fix spelling and punctuation.

We welcome strong opinions and criticism of our work, but we don't want comments to become bogged down with discussions of our policies and we will moderate accordingly.

We appreciate it when readers and people quoted in articles or blog posts point out errors of fact or emphasis and will investigate all assertions. But these suggestions should be sent via e-mail. To avoid distracting other readers, we won't publish comments that suggest a correction. Instead, corrections will be made in a blog post or in an article.