Akismet htaccess extension
My spam counter in Akismet has been steadily rising of late, and it’s been approaching 10,000 caught spams very quickly. Yesterday it went through 9,950 and with my average of over 100 spams per day it should have gone through the 10,000 barrier by now. But instead I’ve had about 3 spams today. Did I just find an off button for spam?
Worst Offenders
I wrote the other day about a small Akismet extension that I’d been playing with that helps remove the worst offenders from the list of caught spam – this in turn makes false positives easier to recognize. One of the things that the extension notices is which IP addresses are particularly prolific, it’s this information that helped me to a 95% (and rising) reduction in the spam that I send to Akismet to be checked.
htaccess
When writing about the extension on the Akismet mailing list, I suggested that hooking the worst offenders IP list apache htaccess file should provide a simple dynamic means of rejecting spams before the http request has been processed by the server (and before the akismet plugin has had to check it against it’s server).
So I tried it, and it works, apparently flawlessly so far. In my access log I can see that over 100 requests have been rejected so far today. That’s 100 requests that are not sat in my “spambox” waiting for me to check for false positives.
Todays Akismet spam count is in the single figures, instead of the triple figures.
Fantastic.
Ongoing Thoughts
So what does this mean for this site?
- If a spammer does manage to leave a message here, then it’s caught by Akismet and marked as such, it never gets through.
- When I remove those messages, I can optionally ban the IP address from whence they came.
- Bandwidth usage is reduced becuse the server does not accept connections from spammers, and fewer two way chats with the Aksimet server are necessary.
- The comment database does not get needlessly filled with temporary comments that are removed once their spammyness has been identified, so the DB and DB indexer has less work to do.
- There will be fewer hits from this site, so more time to concentrate on others.
- If the changes can be used by others, the net effect will be a more scalable and responsive service.
- If large scale uptake was achieved, the spam zeitgeist might start to look different because the number of spams being checked daily should significantly reduce.
- The installation proecess will require a tiny bit of “hand cranking” to ensure the htaccess file is in a suitable state for automatic updating.
- The system should probably exist as a separate plugin that hooks into the Akismet plugin at appropriate points, but those points haven’t been defined yet.
- The system should probably exist as a separate plugin that hooks into the Akismet plugin at appropriate points, but those points haven’t been defined yet.
- Not everyone uses Apache, so not everyone has an htaccess file.
- Not everyone uses WordPress, so Akismet service users on other platforms will have to re-implement the idea.
- Precondition: Follow the installation instructions for Akismet and ensure it’s working correctly.
- Download the Extended Akismet Plugin.
- Replace the akismet plugin with the one you just downloaded – it should pick up all the configuration from the “official” version of the plugin.
- In the WP admin interface Open the “Manage | Files” tab, and check that your htaccess file is writable.
- The automatic IP banning is written between two markers that are “# BEGIN worst-offenders” and “# END worst-offenders”. If you already have a deny list in your htaccess file, just add the markers to the list. Mine looks like this:
Order Allow,Deny # BEGIN worst-offenders Deny from 202.75.49.130 Deny from 202.75.49.131 Deny from 202.75.49.133 Deny from 202.75.49.134 # END worst-offenders Allow from all
- From now, when you look at your “Worst Offenders” list in Akismet, you should see the option to ban the spamming IP addresses when you delete the messages.
- Feedback and ask questions below
- Is this a polished and buffed plugin that’s ready for the prime time?
No! Absolutely not! For many reasons. This is an experiment to see if the idea works and to generate some discussion around what’s needed for dynamic spam blocking systems to work. It’s public so that those with the right skills can try it, or examine it, and perhaps learn from or contribute to it. If you’re an armchair amateur blogger, this plugin is not for you; yet. - Why are some items in the list of offenders not ticked?Items with fewer than 4 spam messages are not ticked – this is flexible within the plugin, but not yet configurable.
- Why do I only see 10 “Worst Offenders” at once?Items with fewer than 4 spam messages are not ticked – this is flexible within the plugin, but not yet configurable.
- I have an idea for making this better, what should can I do?Discuss it, implement it, share it.
- Does all this IP address checking add more load onto my poor server?The benefits far outweigh the costs. It’s a small increase at the front end, but a massive decrease overall. Comparing an IP address is mathematically simple task, so it is very fast. Storing the comment, sending it to the Akismet service and then removing it from the database is much more work.
- Could this be an end to spam?No. The number of zombie machines out there is too large to block them all, this method reduces spam from zombies that know about your website, so it directly saves your resources whilst reducing the number of calls your server makes to the Akismet service.
- How many machines can this method block?Currently IP addresses drop off the end of the list after 400 are added, so the least recent disappear. This is adjustable in the software but not user-configurable yet.
- If I’m blocking spam at source, will capability of Akismet be diminished, because it might not see new variants of spam as they emerge?
I don’t know. I doubt it. Maybe Matt can add detail without giving too much away. - Can it block legitimate comments from non-spammers?
It’s possible, but improbable. In cases where the spammer comes through a proxy, the IP address of the proxy might get banned, so anyone attempting to connect to the site through that proxy would get a “403″. Similarly, in shared IP pools for dial-up users, it’s feasible (though highly improbable) that a spammer might dial up, spam, become banned and then hang up, relinquishing the IP address to the next user. If that user happens to visit your site then they’ll get a 403. The lower threshhold of ‘n’ spams from an IP address or a domain is there to decrease these possibiities, but it cannot negate the issue. - How long does the ban last?
Currently there is a rather arbitrary limit of 400 IP addresses in a FIFO queue. When an address gets to the end it’s dropped off the list and is thus allowed to connect again. - I think I’m ok with .htacces files, but what if i’m not?
Take a backup before you start: cp .htaccess .htaccess.bakin your wordpress root. Then if you want to revertrm .htaccessthencp .htaccess.bak .htaccess. - Can I revert to the vanilla Akismet plugin?
Yes. No changes are made that affect the standard akismet functions, so you can swap back and forth by replacing the akismet.php file as many times as you like. - I want it, but I’m not an uber-geek, is there any hope?
Yes. If you think it’s a useful idea, the most helpful thing you can do is blog about it. If people red your blog and like the idea then it will help generate interest. Interest generates ideas, which increase the likelyhood that this could turn into something really useful.
So what does this mean for the Akismet project?
There are several obstacles to global spam nirvana, including:
Download
i.e. Please don’t ask linux/htaccess questions – I’d love to help, but I don’t currently have time to hand-hold on a non-production experiment.
Still keen? Great.
If you’re really brave and want to try it out, you need to carefully follow these steps.