Akismet Worst Offenders Extension

This last few weeks the site has been very heavily hit by comment spammers hawking their usual reprobate websites and wasting internet bandwidth. Akismet has been doing a sterling job of catching this spam and not one message has made it onto the site (I wrote about Akismet’s effectiveness in the pre-launch testing previously). In the bad old days before Akismet I’d have to go through the “unmoderated comments list” in order to find the occasional real comment amidst all the spam, this is no longer necessary, which is wonderful. Comment-Spam-Nirvana has not been reached yet, however.

A screengrab showing a list of common spammers.

In order to help keep Akismet working well, and also, to ensure that “false positives” do not go unnoticed it is still necessary to trawl through the “spam list” and look for real comments. So although the problem has been turned on it’s head, the requirement on the responsible user is still the same.

The latest version of the Akismet plugin (v1.15) makes this “de-spamming” process easier, but it still leaves the poor website owner with the responsibility of looking at every single spam message in case there are any real comments that have been mistakenly marked as spam.

Informing the Akismet server about these false positives is important because it helps improve Akismet’s accuracy, which benefits everyone by ensuring fewer false positives – one hand washes the other, so to speak.

So I wrote a small addition to Akismet 1.15 (pictured above) that tries to help. It pre-processes the spam comments and identifies the worst offenders in terms of the domain that’s being advertised, or (perhaps more usefully) the IP Address of the spamming computer.

It’s not uncommon for me to get several hundred spam comments each day, so certain machines and websites are hitting my site many times. What the plugin does is make those worst offenders really obvious, so they can be removed en masse, reducing the ham-hunting to a smaller and more managable task.

Download it here: Akismet 1.15 plus Worst Offenders Extension. A stand alone version is available which works with newer versions of Akismet, see the discussion on the forum for more details.

About these ads

44 thoughts on “Akismet Worst Offenders Extension”

  1. hi, I had to close off comments altogether on my blogs because I was getting an unmanageable amount of spam. The good folks on the Pivot project implemented a technique in their latest version called Hashcash. It’s a really simple technical countermeasure and it’s completely eliminated spam on my blogs without the need for and bayes analysis or blacklists or other difficult-to-manage technique, or the user-unfriendliness of captchas, quizes, etc. The downside is that those commenting have to have javascript enabled on their browser, but I feel that’s a small price to pay. I still get valid comments and the spam log indicates that I’m still getting hit thousands of times a day by spam attempts. I probably won’t last forever, but in the meantime it’s very effective and unlike email-style filtering techniques, it doesn’t generate false-positives.

  2. Methods like Hashcash only work for regular comments, it’s impossible for protocols designed for blog-to-blog communication like Trackback or Pingback.

    Rich, plugin looks pretty cool. :)

  3. Pivot also has “hardened trackback” to eliminate trackback spam. It also requires javascript on the browser and has also been 100% effective so far.

    Sure it might not work forever. Maybe someone will bother to write a smart enough spambot that gets it.

    But in the meantime, I find it odd, although not completely out of character for geeks, to dismiss a solution that’s simple but might not work forever in favour of a suite of complex techniques that aren’t nearly as effective.

  4. Dismiss hashcash, no, but, I’ll go into more detail (because deep down I’m a scientist first; geek comes a distant second).

    I’m not (and I dont think Matt was) dismissing hashcash, just discussing a few of it’s most obvious weaknesses. It’s a technique, it has flaws, so does everything else.

    The fundamental difference between static spam catching mechanisms and any dynamic service (not just Akismet) is that the dynamic service recognizes the spam rather than obfuscating the interaction.

    It is a weakness of recognition mechanisms that they can result in false positives. This extension shows how the volume of spam received can be used to massively simplify the process of spotting such (rare) errors.

    What you gain on the swings you lose on the roundabouts.

    Akismet is arguably more effective because the techniques it uses for recognition can adapt over time. If a side by side test of dynamic -vs- static system were conducted over 10 years, it’s possible (maybe even likely) that static could beat dynamic initially, but once enough bots can circumvent the static system, it’s game over.

    That’s not to say that using a static system in front of a dynamic one is not a good idea; many people use other anti-spam plugins together with Akismet, and report that it works very well.

    What I do think is a fundamental flaw of hashcash is the reliance on JavaScript. I’m a strong advocate of web accessibility and try to ensure every page I make is valid. The result of a website that requires javascript for interaction, is facilities that are not equally available for all people – i.e. – someone accessing the site through a screen reader may not be able to leave a comment. That’s a bad thing.

  5. First of all it’s well worth trying out the Bad Behaviour plugin – I use it and as a result I’ve had to wait 3 days to build up 11 spam messages in Akismet for it to spot a grouping – meanwhile it’s rejected 222 connections from things claiming to be browsers but aren’t (or break the HTTP specs in some way) in the last 7 days.

    cheers!
    Chris

  6. Pingback: boakes.org
  7. Pingback: orioa
  8. If anyone’s got an inkling of what might be breaking on Clay’s installation, RSVP! Looking around the net it seems that the error he describes can be seen in cases where there are unmatched brackets, so I might need to do some charcter escaping somewhere. If anyone else is seeing the issue, please be sure to mention it.

  9. great plugin.

    it seems like i have to click the “delete worst offenders” button 2 or three times, each time some unknown subset of the worst offenders gets deleted. does this make any sense? i’m running 2.0.2

  10. Hi John, um, sorry it doesn’t make a whole pile of sense, yet. What do you mean you “have” to click the button several times? What are you trying to achieve that makes this multiple clicking a requirement? I’m certainly baffled by the “unknown subset”, so if you can elaborate a little that would be cool.

Comments are closed.