Are you human? How CAPTCHA asks the wrong question & solves nothing

I hate spam. I also hate CAPTCHAs.

Spam’s not just an issue for web site / app / email consumers, although it’s a major annoyance. It’s a huge problem for developers and those who run the services. While you might get 50 spams a day, say, the problem is that the servers used in the process of sending & transferring are getting hit a million times harder.

So, what’s a body to do?

Test for other bodies, right?

CAPTCHA catches on

CAPTCHA was a term that we began to become familiar with in 2001 and 2002. It was invented in 2000, by a couple of folks from CMU and IBM, in response to problems with Y! chatroom spam. CAPTCHA stands for Completely Automated Public Turing test to tell Computers and Humans Apart. (I, personally, think they worked too hard on that one.)

Since then, it’s transitioned from a bizarre, nigh unpronounceable oddity to an everyday annoyance that we accept with a sigh.

We now see CAPTCHA everywhere a service provider is afraid of losing resources to spambots.

And a number of places where there’s no such likelihood, just because CAPTCHA has become a reflexive action—just like the black velvet dots for disguising smallpox scars became a fashion statement for the unafflicted.

The grand goal?

The whole point of CAPTCHA is to stop spammers in their tracks.

The method?

Stupid Human Tricks.

There are lots of things computers can’t do but humans can. The best way to test if a body is a human or a spambot is to make it do human things. But rather than engaging in a dialog on Stoic philosophy or writing limericks, say, which are hard things to evaluate on the back-end, the CAPTCHA people came up with something a little more… visual.

The human brain is the best image processing computer in the world. Nothing we can program compares. We can detect patterns, especially faces and letters, in almost anything, no matter how distorted or fanciful.

So. Obvious conclusion ahead:

Let’s distort text and make humans enter it! Yay!

70694F3E-760D-4050-9EC4-C1788D9F5597.jpg

The above example is a really old school CAPTCHA—one on the first, using the swirl distortion. It’s really easy to read. Not just for humans: it can be cracked by software.

It didn’t stay this way for long.

Failure, doom & destruction!

I once read that there are two basic levels of failure: Level 1, where you do the thing wrong, and Level 2, where you do the wrong thing.

CAPTCHA fails on both levels.

Level 1 failure: failure to operate as intended

CAPTCHA may have diminished spam dramatically… for a while. But like any spam-fighting technique, it doesn’t operate in a vacuum.

Yes, CAPTCHA—supposedly a Stupid Human Trick hat trick—can be cracked.

The rolling out of CAPTCHA pissed off spammers who, in the finest tradition of salty stories, became bent on revenge. They found a number of ways to crack the early CAPTCHAs.

CAPTCHA images must get ever more difficult to parse, even for humans, necessitating the addition of a “reload” feature when the images are totally unreadable.

It’s a death spiral.

There’s an inevitable endgame coming:

Most CAPTCHA research to date has been limited to academic applications. Far more powerful algorithms will be required for commercial CAPTCHAs. As CAPTCHAs become more prevalent, bot programmers are expected to unleash armies of bots bent on breaking them. — PARC web page

Level 2 failure: it’s the wrong thing, anyway

But the most intriguing aspect of cracking CAPTCHAs is that you don’t have to crack CAPTCHAs to get around them.

Let’s review:

  1. CAPTCHAs demand mad image processing skillz.
  2. What are the best image processing computers in the world? Humans.

Get it?

Computer science researchers know exactly how hard image processing with computers is, because that’s a constraint they come up against in their research all the time.

But spammers are much better judges of human nature than computer science researchers.

There’s no need to be an image processing whiz to defeat CAPTCHA. What you need to defeat CAPTCHA are warm bodies. Not even smart ones. Just living and breathing and neurologically firing.

Spammers simply farm out the CAPTCHA solving to those fleshy meatbots that do it like second nature: humans.

Thanks to Mechanical Turk you can get CAPTCHAs solved and open all the spammy fake accounts you want for about a nickel apiece. There are other online markets, too. Hell, hire a dedicated team!).

Even more cheaply, and probably even more speedily, you can use human’s weaknesses as leverage (weaknesses other than money!).

Some brilliant folks source CAPTCHAs from the sites they wish to infiltrate and put them in front of download links for pirated copies of music, movies and porno.

The people seeking the music and porn will fill out the CAPTCHA for free, without thinking “Oh no! What if this CAPTCHA stands between Yahoo! Mail and one more spambot? How will I ever live with myself?”

A category failure at heart

The real way to stop spam is not to test if a request originates with a human. Humans are clever, devious and untrustworthy.

A better way to stop would be to identify spamminess from other metrics that are unique to spam: behavioral patterns, Bayesian filtering, keywords.

Not that I’m saying it’s easy. There’s a reason I’m not a computer science research scientist.

But, uh, need I say more?

By the way! Thomas Fuchs and I are putting out our JavaScript Performance Rocks! ebook & profiling tool early next week. Stop by his blog post for details & for a 20% discount.

24 Comments

  1. Ryan says:

    I agree wholeheartedly, despite the fact that I, myself, have a CAPTCHA for my blog comments. It’s not a rotating image, and a cookie saves the value for a year, so it’s a little better, but still.

    I think this is an interesting approach: "http://snook.ca/archives/other/effective_blog_comment_spam_blocker/":http://snook.ca/archives/other/effective_blog_comment_spam_blocker/.

  2. Ryan says:

    Here’s the URL without the assumption that textile was present :-)

    http://snook.ca/archives/other/effective_blog_comment_spam_blocker/

  3. Eric Givens says:

    Another approach is the ‘reverse capthcha’ technique, which is starting to be used a little more: you include a form field which is hidden from human view, but readable by a bot, and trick it into filling some data. Any data entered into the hidden form field results in rejection.

  4. Jim Neath says:

    As Eric said you could always use a honey track technique, although they’re not the best methods.

    I’ve had quite a lot of luck with akismet and other similar services.

    I hate CAPTCHA though and just refuse to use it. Those damn question tests are annoying as well.

    "What is 1 + 1"

    Now correct me if I’m wrong but I’m pretty sure it’s easy to do mathematics on a computer.

  5. Jim Neath says:

    Honey track? I meant honeypot.

  6. John Athayde says:

    Waldo Jaquith (waldo.jaquith.org) does something great o nhis personal blog. He generally writes about Virginia politics, so you simply have to answer this question:

    The two major political parties are Republican and [_______________]

    Computers are great at takign 1+1 in a shell, but contextual reading is not their strong suit. And no silly image processing.

  7. Zach Waugh says:

    I think that captcha is fundamentally flawed. Instead of a human having to prove they are human, a computer should prove they’re not a computer. The burden shouldn’t be placed on the user. The reverse captcha as Eric mentioned is one way to do this.

  8. Si says:

    It is also worth saying that CAPTCHA also used inappropriately in a lot of cases. Search tools on a lot of popular forum/message board software often have them. Combine this with a ~20 second anti-flood delay and it makes the process of trying to find a particular post (often by trial & error) absolutely infuriating. Grr!

    I must confess though that I have used CAPTCHA methods myself to stop contact forms I’ve created from being spammed and it was and still is 100% effective. I use http://recaptcha.net which turns a fairly annoying self-serving security exercise into a more noble literary quest – better than nothing, eh!

  9. Check out less-reverse-captcha:

    http://is.gd/eKlZ

    I have a fork of it that implements a change Steve Bristol and I discussed but he hasn’t pulled in yet:

    http://is.gd/eKm9

    It’s been working great on http://isfeasting.com (in the comments section)

  10. Eric, I like the approach, and Randy great plug-in.

    I also saw one site last year that had a captcha that was so plain-text it made no sense that it was actually protecting anything. When my colleague and I started hacking it, we discovered that there was a image-flaw in one character, possibly hard-coded, possibly not, that absolutely befuddled every OCR perl library we could throw at it… the single character simply vanished from the OCR result. I wish I kept a link to it… other than reverse-captcha, I think it was an amazing response to the silliness.

    Thanks for the post Amy!

  11. CAPTCHAs are in the same ugly bucket as taking your shoes off before getting on the airplane: Security theater.

    Great post, Amy.

  12. Vinicius says:

    One interesting idea that I’ve read around is to add to the reverse captcha, or negative captcha, math operations to be processed by javascript, with some random parameter. This turns javascript mandatory for your forms, but I think this could be cool for some pages.

  13. Good post. But if captcha is bad, honey track technique (honeypot), mathematics stuff is as well no the best, what is the solution?

  14. Dirk Stoop says:

    Here’s a smart alternate approach:

    http://lemurcatta.org/

  15. Chris Elfers says:

    Who still gets SPAM? Honestly, I’ve been using gmail for 5 years now and I think only 5 or 7 emails have actually made it thru the SPAM filters they use.

    People still get SPAM now adays? Too bad for you!!! Use gmail and 100% of the spam goes into the ‘SPAM’ folder.

    Stop with the garbage non-Occam, hugely complex theories and just use GMAIL! It will CHANGE your life. (No, I do NOT work for google).

  16. Carl Lumma says:

    Bayesian filters are completely effective against email spam, but for comment spam I don’t think they can work, since the spammer can see whether the filter accepts or rejects each attack. Recently some folks on slashdot did recommend

    http://akismet.com/

    though. -Carl

  17. a says:

    you’re just trying to hard for a good answer to a hard problem.

    Captchas do work. to an extent. they slow down the armies of bots.

    they are often not too hard and you get to try again if you fail.

    If the Admin finds captcha isn’t working he will add more heuristics.

    beysian filtering isn’t perfect either because it stops legitimate people(which is the worst scenario) and can be hacked.

    the turing test is by definition hard because we humans despite our immodesty are machines. and computers are continually being designed to be able to think the things we do.

    you suggested tests that possibly involve fuzzy logic. but fuzzy logic code is highly inefficient to write, execute and will always have poor accuracy in many cases.

    say for example a test that asks you what was the meaning of this joke.

    could you say you’ve got every joke you’ve ever heard. people and programmers need to work together to reduce this problem even if they have to squint a bit to read a character.

  18. Bob Smith says:

    I deeply sigh at you Chris Elfers.

  19. Eric D says:

    Another approach is to differentiate the form from the crowd. If all forms are Name-Email-Url-Comment, it’s easy to write a bot to post to the whole world. But if your from is different and the process is slightly different from a human perspective (very different for the computers), the bots will fail with your site.

    For example, on my blog, I’m using a little known plugin that produces a simple javascript doing a calculation. The expected answer is stored in session and the form result should contain the js-filled field with the correct result. 99.9% of spammers won’t execute the javascript. Of course, if everyone starts using this plugin, spammers will adapt but as long as my method is different from yours, we’ll give spammers a hard time.

  20. dan says:

    I want to elaborate…

    if someone is desperate enough to spam your site…they CAN.

    trying to trick the user with simple calculations (i.e. what’s 1+5) is a laughable preventive measure. I can simply write a script to detect for that and enter the appropriate values.

    And if you have random questions (what color is an orange) – a script can be made to save your questions with the appropriate answer set.

    My advice to avoid spams/bots/crawlers.

    1) Use a descent captcha library 2) Utilize Flash (not accessible to mobile phones, or people without flash)

  21. Benjamin Franz says:

    A lot depends on whether you are trying to defend Gmail or you are just trying to defend a contact form on your own (low volume) web site.

    An extremely effective method of stopping bots for the latter (but not the former) is a custom rolled Javascript with some minor obfuscation that must run to submit the form by back filling the submission address to a script that is not at a ‘well known’ name.

    Bots don’t normally run Javascript and can’t find the script without running the Javascript first.

    Humans using web browsers don’t even notice.

    Additionally, it is hard to attack through a Mechanical Turk because there isn’t anything you can easily extract from the page to give an out-of-context method to ‘crack’ it.

    It works 100% for me (and has for several years now), but it wouldn’t work twenty minutes for a Gmail sized site.

  22. Rob Reid says:

    I implemented a reverse CAPTCHA system which used an input hidden with CSS to trick the bot into completing it. However I found that if you named it something useful trying to trick the bot into completing it e.g name2 or email2 etc you would get problems with all the auto-complete toolbars out there filling in the hidden input as well which you obviously don’t want.

    The way round that is to give it a name the autofills ignore but then there is a chance a bot may also decide to ignore it as well.

  23. Jakob Egger says:

    A good technique for all but the most frequent websites is to use some unique JS tricks to render the post form or parts of it incomprehensible to bots (Eg.: enter a wrong url into the action-tag and replace it via javascript as soon as the user moves the mouse over the form).

    This would be invsible to the user — but only work as long as your trick is unique.

  24. Henri Kemppainen says:

    @Jakob Egger @dan

    JavaScript and Flash are both used in an annoying way, leading to users disabling them. If/when a site forces them on you (in place of a CAPTCHA), it’ll be perceived just as annoying as captchas are. Also, Flash (in general) and JS (when used as described by Egger) both limit accessibility and are thus yet another annoyance. Did you forget the people who don’t use a mouse?

Leave a Reply

Hey, why not get a shiny
Freckle Time Tracking
account?