WebAIM - Web Accessibility In Mind

Spam-free accessible forms

There has been much discussion lately about how to prevent spambots from submitting forms on web sites. Many solutions have been presented, many of which impact the usability and accessibility of the web page. CAPTCHA is a classic case where the user and accessibility is directly impacted.

Note

A Brazilian-Portuguese translation of this blog entry is available at http://www.maujor.com/tutorial/spam-em-formularios.php.

Over the last year or so I have compiled the following basic techniques for blocking spam submission in web forms. I’ve implemented just a couple of these and through logging have found that they have effectively reduced around 99% of spambot submissions while having no or very little impact on the usability or accessibility of the forms. Nearly all of these techniques are performed server-side using PHP and the relevant PHP code is shown below, however, the tests can be readily implemented in nearly any server-side scripting language.

Disclaimer 1: These spam prevention techniques may not work for enterprise level application where spammers may target forms specifically. They are intended for generic contact, comment, or registration forms where a spammer is less likely to take the time to try and bypass your specific spam prevention mechanisms.

Disclaimer 2: These techniques primarily stop bots and automated spam submission programs. They also can filter certain content. However, they likely will not prevent an actual dedicated human from posting spam to your web site.

The techniques are:

  • Detect spam-like content within submitted form elements
  • Detect content within a hidden form element
  • Validate the submitted form values
  • Search for the same content in multiple form elements
  • Generate dynamic content to ensure the form is submitted within a specific time window or by the same user
  • Create a multi-stage form or form verification page
  • Ensure the form is posted from your server

Detect spam-like content within submitted form elements

This technique is likely the most powerful spam prevention technique. Most spam bots are in existence to either post URL’s of web sites in an effort to increase traffic or increase their search engine ranking or they are attempting to hijack your form to send spam messages to you or others. Detecting commonly used spam content or e-mail header injections will stop nearly all spam bots dead in their tracks.

The following PHP code, when placed on your form processing page (the place where the form is submitted to), will search all of the form elements for the most common header injections and other code that may trick your mail processor into sending carbon copy or blind carbon copy messages to others. It also detects any content that includes the string “[url” which is used by most forum software to specify links. If any are found, it sets the $spam variable to true.

if (preg_match( "/bcc:|cc:|multipart|\[url|Content-Type:/i", implode($_POST))) {
    $spam=true;
}

NOTE: Internet Explorer 6 has a bug that will not allow proper overflow of preformatted text. If you are still using that browser, you will need to properly reflow the PHP code lines from this page.

You can also detect links and urls within the form elements. The following will set the $spam variable if more than 3 instances of “<a” or “http:” appear anywhere within the form.

if (preg_match_all("/<a|http:/i", implode($_POST), $out) > 3) {
    $spam=true;
}

This will defeat most spambots as they primarily focus on posting links or hijacking your mail script. Beyond this, some very basic word filtering can often catch spam that finds its way through.

$spamwords = "/(list|of|naughty|spam|words|here)/i";
if (preg_match($spamwords, implode($_POST))) {
    $spam=true;
}

You can also use external spam detection services with up-to-date patterns of spam content. My favorite is Akismet. Akismet is commonly used for filtering spam on blog comments (it has blocked nearly 14,000 spam comments to this blog in the last 9 months!), but it can be used successfully for nearly any web form.

Detect content within a hidden form element

Most spambots will find your form, determine what the form element names are, and find the URL where the form is posted to. The software will then post those form elements with modified, spam-filled values back to the form submission URL. Typically, the bot will populate every form element with some value so as to best ensure that it will succeed in being posted. So, if you insert a standard text input element into your form, but hide it visually from the user so the user cannot enter anything into this field, it is quite likely that the spambot will still post some value for this form element. If you detect that the form element is submitted with a value, then it’s almost certainly a spambot.

For instance, your form element may be inserted as

<span style="display:none;visibility:hidden;">
<label for="email">
Ignore this text box. It is used to detect spammers. 
If you enter anything into this text box, your message 
will not be sent.
</label>
<input type="text" name="email" size="1" value="" />
</span>

Notice that CSS is used to hide the text input and its label from view. This code will also hide this content from modern screen readers. However, if CSS is disabled, the input will still be displayed. For this reason, an explanatory label is provided that informs the user to not enter anything into the text box. I also gave the input a nice, juicy, tempting element name of “email” – that’s almost certain to get the spambots to enter a value.

You then simply detect if the form element is empty. If it is not, then it’s either a spambot or a user that has CSS disabled and did not follow the label instructions.

if(!empty($_POST['email'])) $spam=true;

This tactic, like all of those listed here, should still present a useful, informative error message in case the user somehow triggers your spam detection flag.

Validate the submitted form values

This one perhaps goes without mentioning, but if you want certain form elements to be required, ensure that you are using a server-side script to detect if information has been entered into those form fields. If you require form information is a particular format (such as requiring a valid e-mail address), then validate it. Many bots will simply submit empty information for fields they do not recognize or will submit random information for certain form fields. Your standard form validation mechanisms can stop many spambots.

// If the message is empty, throw an error
if(empty($_POST['message'])) $error=true;
// if e-mail is not formatted correctly, show error message
if(!eregi("^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})$", $_POST['email']))) {
    echo "Please enter a valid email address.";
}

Search for the same content in multiple form elements

Some spambots will post the same text into all unrecognized form fields. If you have two form fields that should never contain the same information, you can detect if their values are indeed the same and if they are, you can flag an error. On our forum registration form, I found that simply throwing an error if the first and last names were the same cut down on bot registrations by around 80%. It’s not a perfect technique and you should ensure that the fields you analyze should always be unique (I guess there is still a chance that a person could have the same first and last name, huh?).

    if($_POST['firstname'] == $_POST['lastname']) $spam=true;

Generate dynamic content to ensure the form is submitted within a specific time window or by the same user

By generating unique form elements or creating session variables, you can ensure that the person that visits your form page is the same one that submits the form. For instance, when a form is accessed, you could use server scripting to write the current time to a hidden form element. When the form is submitted, you can compare the hidden form value with the current time and ensure that no more than, say, an hour has elapsed. The likelihood of a spambot generating the correct value for the time form value is very unlikely. You can also set browser cookies or use other client sessioning systems to ensure that a user session is established and maintained between the form page and the submission page.

The following will write the current time in UNIX time format to a hidden form input.

<input type="text" name="formtime" value="<?php echo time(); ?>" />

When the form is submitted, you can measure the difference between the current time and the value stored within the form. If the time difference is more than a specified value, you can flag it as spam. In this example, if more than an hour (3600 seconds) has elapsed between the time the form was viewed and the time it was submitted, the spam variable is set. This code will also flag the message as spam if the formtime value has been changed to some other value, such as a URL or an e-mail address.

if($_POST['formtime'] < time()-3600)  {
    $spam=true;
}

Create a multi-stage form or form verification page

By creating a multiple stage form process, most spambots will be unable to find the actual script that processes the final form data. This can be as easy as having the user verify their input after submitting a form and then selecting a second button to actually submit the form elements for processing. This can be made even more foolproof if the original form and the verification page are processed at the same URL. If the form element data is stored server side before the final verification step (rather than in hidden form elements that can be submitted by the spambot), it becomes very difficult for an automated system to submit the form.

if($formsubmitted == true) {
    // database the form elements and display the verification page.
    // If the user verifies the form information, then process the databased data.
}
else {
    // display the empty form
}

Ensure the form is posted from your server

Because most spambots post to your form script from a remote computer, by detecting if the form information has been submitted from your own web site, you can stop many spambots from submitting the form to your processing script. Most scripting programs can check the page referrer, or the page that was used to get to the current page. It’s important to note that it is quite easy for spambots to forge the referrer information to appear as if the form is coming from your web server. Also, some browsers and firewalls will not send the referrer header at all.

The following code will check to ensure that the page referrer (incorrectly spelled ‘referer’ in the HTTP spec and in PHP) exists, and if it does, that the referring page is on the same web site as the form processing script. For browsers or spambots that send no referrer information, the message is never flagged as spam.

if((isset($_SERVER['HTTP_REFERER']) && stristr($_SERVER['HTTP_REFERER'],$_SERVER['HTTP_HOST']))) {
    $spam=true;
}

Conclusion

Preventing spam submissions to web forms is difficult work. However, when possible we should not place the burden of preventing spam on the end user through CAPTCHA or other turing tests. Any time it becomes the user’s responsibility to somehow manually prove that they are a human, accessibility will be decreased. These techniques offer several methods of filtering out most form spambots without placing any burden on the end user.

I’m sure these are not all of the possibilities and it’s likely that there are flaws in my techniques above. If you have comments or better techniques, please post them below.