Do You Really Need a CAPTCHA?

A CAPTCHA is a barrier against abuse of your system: you may be protecting comments from spam, or account registration. It is a disruptive barrier to your users, an additional task they need to complete. Often it can be avoided: here are the actions I take before resorting to a CAPTCHA. Any example code uses PHP, but the concepts are language agnostic.

1. Form Timeout/Session Token

You may be doing something similar already to protect against CSRF attacks. The idea is to be sure your form has been requested (via GET) before it is submitted (via POST). An automated bot will often skip straight to the submission (POST) step.

If you are using a session store to maintain state, a randomly generated session salt can be used:

  1. <?php
  2. session_start();
  3. if (!isset($_SESSION['salt'])
  4. $_SESSION['salt'] = sha1(uniqid());
  5. // ... [snip] ...
  6. // check salt on submission
  7. if (count($_POST) && !$_POST['salt']===$_SESSION['salt']) {
  8. // error condition, redisplay form
  9. }
  10. // ... [snip] ...
  11. // include salt in requested form
  12. echo "<input type=hidden name=salt
  13. value={$_SESSION['salt']}>";

If you are avoiding session state, the same effect can be achieved using form timeouts which are "signed" to ensure authenticity.

  1. <?php
  2. define('SALT','My^%*&SecretSalt');
  3. // ... [snip] ...
  4. // include signed request time in form
  5. $now = time(); $signed = $now.'#'.sha1(SALT.$now.SALT);
  6. echo "<input type=hidden name=requested value=$signed>";
  7. // ... [snip] ...
  8. // check timeout on submission
  9. if (count($_POST)) {
  10. list($when,$hash) = explode('#',$_POST['requested'],2);
  11. if ($hash!==sha1(SALT.$when.SALT) || $when<(time()-30*60)) {
  12. // error condition, redisplay form; either
  13. // corrupted or the form was served > 30 minutes
  14. // ago
  15. }
  16. }

2. The Honey Pot

This can counter bots that automatically spider and submit forms: useful to protect against spam. Such bots will usually try and fill in fields, so if you render an empty checkbox such a bot will most likely "tick" it. Use CSS to hide the field from normal users and label it adequately so that any screenreader will make it clear that it should be left blank.

  1. <?php
  2. if (isset($_POST['honeypot'])) {
  3. // error condition, redisplay form
  4. }
  5. ?>
  6. ...
  7. <label style="display:block;position:absolute;left:-9999px">
  8. Please leave this checkbox blank
  9. <input type=checkbox name=honeypot value=1>
  10. </label>
  11. ...

3. Dynamic Fieldnames

Abusing a form is easier if fieldnames remain static between requests. If you have already protected your form with a session/timeout token, you can make fieldnames change by depending on this token. The only catch is that hashed fieldnames won't be recognised by a browser's "autocomplete" engine, so this will have a minor impact on useability.

  1. <?php
  2. define('SALT','My^%*&SecretSalt');
  3. function fieldname($name,$salt) {
  4. return sha1($name.$salt.SALT);
  5. }
  6. // submitted
  7. if (count($_POST)) {
  8. $token = $_POST['token'];
  9. list($when,$hash) = explode('#',$token,2);
  10. if ($hash!==sha1(SALT.$when.SALT) || $when<(time()-30*60)) {
  11. // error (over 30min timeout)
  12. }
  13. $fn = fieldname('comment',$token);
  14. $comment = isset($_POST[$fn]) ? $_POST[$fn] : null;
  15. if (!$comment) {
  16. // error (no comment submitted)
  17. }
  18. // etc
  19. }
  20. // ... [snip] ...
  21. // create token
  22. $now = time(); $token = $now.'#'.sha1(SALT.$now.SALT);
  23. ?>
  24. ...
  25. <label>Comment
  26. <textarea name="<?php echo fieldname('comment',$token); ?>"></textarea>
  27. </label>
  28. <input type=hidden name=token
  29. value="<?php echo $token; ?>">
  30. ...

4. "Fuzzy" Filters

Soft or "fuzzy" filtering can be useful if your application already has some sort of priority queue: for example an existing blog where comments can be published immediately or pushed into a manual moderation queue. In this case it is worth assessing the probability that a submission is from a bot and if that probability is high doing something different than if the probability is low. Your mileage with this may vary, but if we follow the blog comments example we might look at:

None of these factors on their own absolutely determine that a bot is making the request, but they all change the probability. The rules you implement will reflect what you are trying to achieve: the concept is that from these factors you calculate a probability of the request being an automated one. If that probability is high, you push the comment to the moderation queue, and if low you publish immediately. This reduces the burden of moderation while still catching most spam submissions.

5. Preview/Confirm Stages

The conditions discussed so far protect a single-step submission. Multi-step submissions are more difficult for an automated bot if each step is protected using the methods described above, and it is impossible for a bot to skip to the last step directly. If your form is simple, you may wish to consider adding a confirmation step to convert it from a single-step to multi-step process. This will impact the user-experience but is often a positive step that helps the user make fewer mistakes rather than a disruptive one like a CAPTCHA that simply slows the user down.

Next?

After considering the actions discussed, if you still have a problem with automated submissions it is time to consider implementing a CAPTCHA. I think of the steps above as a series of defensive barriers: they are surmountable by an attacker with some effort, but all reduce the probability of a successful attack. The addition of a CAPTCHA is no different: it is an additional defensive barrier, but doesn't protect your application from abuse.

Continue to the pros and cons of the textCAPTCHA service ›

Questions? Contact Rob.