As if there weren’t enough anti-spam plugins, right? Actually, this isn’t a traditional anti-spam plugin. This is a regular-expression based blacklist plugin. And by blacklist, I mean blacklist. If a comment matches one of your regexes, it gets deleted immediately, not sent to moderation.
Why use this plugin?
So why would you want this? I don’t know about you, but 90% of my Akismet spam is really obvious spam. It contains obscenities, BB code ([url...]), “payday loan” offers, and other things that are really obvious. I don’t want this stuff in my spam queue–I want it shot on sight. That way, the spam queue only has stuff in it that might actually be genuine comments that were miscategorized as spam.
How it works
In short, this plugin takes some of the load off of Akismet by looking for really obvious stuff. It’s easy to use. Just activate the plugin–that’s all. It comes with four regular expressions. You can add to, modify, or remove these if you want. This is the default blacklist:
<?php $kb_spamBlacklist = array(
// First, let's check for [url]...[/url] markup, a sure sign of a spammer (unless you're using a bb code plugin)
'~\[[^\]]*url[^\]]*\]http[^\[]+\[/url[^\]]*]~i',
// profanity and obscenity. Remember that spammers use ! for i, @ for a, * for u.
// Also, most of these get surrounded in \W so that, e.g. ASSume doesn't get mistaken for profanity.
'~(\Wf[u\*]ck|x{3,}|\Ws[u\*]ck[i!]ng|\Wt[i!]ts?\W|\W[a@]s{2}\W|v[a@]g[i!]n[a@]|\Wc[u\*]nt|pen+[i!]s)~i', // depending on your audience, you might not want this one
// Now let's check for ... interesting ... pharmaceutical offers
'~(c[i!][a@]l[i!]s|v[i!][a@]gr[i!]?[a@])~i', // remember that they sometimes use ! for i and @ for a
// payday loans, anyone?
'~(credit|loans?).*(credit|loans?).*(credit|loans?)~Usi', // if they use "credit" or "loans" too many times in their comment, kill it.
); ?>Try it out
If a comment gets caught by one of those regexes (or by another that you add), the commenter sees an error message. If you want to try it, write a comment that uses viagra, cialis, or [url]http://buy-junk.com[/url] in it and see what happens.
Also includes a widget, if you’re into that. Look in the sidebar.

11 Comments
Thanks for making this available. I modified it for use with my SMF forum. It’s caught all my test spam messages and let the real ones through. I’ll let you know how it works with real spammers (I get about 20 a day!).
Exactly what I was looking for!
I’m tired of having to double-check the spam for the really really obvious stuff. I’ll probably use it as is but add a “url=http” search (unless your line already catches it).
Anyone using this with WordPress 2.7x?
Help, I can’t figure out the characters in the code.
Can someone help me code it to look for “url=http†?
Most of my spam follows that format.
Gary,
The way this is coded, it should work in just about any version of wordpress. It’s very simple. I’ve got it running in 2.7.1.
I’ve found that it’s easier and more reliable to look for the close tag
[/.u.r.l](I had to use those dots to evade my own filter) than theurl=httptag. Like this (but delete all the dots):Thanks Adam!!!
Didn’t really know how to code the regex into php.
This has cut out SO much of my spam. As thanks, I went to donate to you and I see you won’t take money; so as per your suggestion I sent money to the Red Cross.
My last bit is I’m trying to auto delete where h.t.t.p (without the periods) appears 5 or more times. That would leave me with very very little left to examine by hand (probably 10-20% of what I usually have to sort through).
I’m looking at your last line of the blacklist (above) I’m assuming if any combination of the two words appear at least 3 times it kills the comment. But the (3) question marks are throwing me off and the part at the end after the tilde is different than the other lines.
As it stands can I change the last like by putting h.t.t.p in the parenthesis by itself (removing the current characters) and it’ll delete 3 or more?
You ask about this (without some of the dots):
The question mark means “the preceding character may or may not exist.” So
loa.ns?will matchlo.anorlo.ans. You don’t need that for what you’re doing.The
~Usigives three flags to the regex.imakes it case-insensitive;sallows the*to include newlines, not just spaces;Umakes the*ungreedy.Try this (change
ht.tptohttp, but keep the dot in//.*):A shorthand for that would be this (change
ht.tptohttp):Without question this will cut down my spam significantly!
Thank you!
(And I’m not putting a U in the new examples. I read the “greedy” parts from the flags link you provided and I’m still fuzzy, but that’s okay since it seems to be working.)
Better than akismet? Seems great.. but what about the false positive rate? Any information?
When it finds a spammish comment, it shows an error message to the user. If it’s a human, they’ll read the message and understand that they’ve triggered a filter. Then they can just go back and remove the offending stuff.
But really, there’s very little reason for a real comment to violate any of the plugin’s rules unless, for example, your blog is all about male-enhancing drugs. But if that’s the case, just delete the line that looks for pharmaceutical offers.
Hi! thanks for this. I’ll try your plug in soon. but is this the latest version. I stumbled in this codec of yours because I have downloaded the KB Advance RSS plugin. which I will try in my upcoming blog site.
Thanks again.