New plugin: KB Spam Blacklist

As if there weren’t enough anti-spam plugins, right? Actually, this isn’t a traditional anti-spam plugin. This is a regular-expression based blacklist plugin. And by blacklist, I mean blacklist. If a comment matches one of your regexes, it gets deleted immediately, not sent to moderation.

Why use this plugin?

So why would you want this? I don’t know about you, but 90% of my Akismet spam is really obvious spam. It contains obscenities, BB code ([url...]), “payday loan” offers, and other things that are really obvious. I don’t want this stuff in my spam queue–I want it shot on sight. That way, the spam queue only has stuff in it that might actually be genuine comments that were miscategorized as spam.

How it works

In short, this plugin takes some of the load off of Akismet by looking for really obvious stuff. It’s easy to use. Just activate the plugin–that’s all. It comes with four regular expressions. You can add to, modify, or remove these if you want. This is the default blacklist:

<?php $kb_spamBlacklist = array(
	// First, let's check for [url]...[/url] markup, a sure sign of a spammer (unless you're using a bb code plugin)

	// profanity and obscenity. Remember that spammers use ! for i, @ for a, * for u. 
	// Also, most of these get surrounded in \W so that, e.g. ASSume doesn't get mistaken for profanity.
	'~(\Wf[u\*]ck|x{3,}|\Ws[u\*]ck[i!]ng|\Wt[i!]ts?\W|\W[a@]s{2}\W|v[a@]g[i!]n[a@]|\Wc[u\*]nt|pen+[i!]s)~i',	// depending on your audience, you might not want this one

	// Now let's check for ... interesting ... pharmaceutical offers
	'~(c[i!][a@]l[i!]s|v[i!][a@]gr[i!]?[a@])~i',	// remember that they sometimes use ! for i and @ for a

	// payday loans, anyone?
	'~(credit|loans?).*(credit|loans?).*(credit|loans?)~Usi',	// if they use "credit" or "loans" too many times in their comment, kill it.
); ?>

Try it out

If a comment gets caught by one of those regexes (or by another that you add), the commenter sees an error message. If you want to try it, write a comment that uses viagra, cialis, or [url]http://buy-junk.com[/url] in it and see what happens.

Download it

  1. sheri (Unregistered)
    Posted May 14, 2008 at 10:43 am | Permalink

    Thanks for making this available. I modified it for use with my SMF forum. It’s caught all my test spam messages and let the real ones through. I’ll let you know how it works with real spammers (I get about 20 a day!).

  2. Posted February 15, 2009 at 1:52 pm | Permalink

    Exactly what I was looking for!

    I’m tired of having to double-check the spam for the really really obvious stuff. I’ll probably use it as is but add a “url=http” search (unless your line already catches it).

    Anyone using this with WordPress 2.7x?

  3. Posted February 16, 2009 at 10:19 pm | Permalink

    Help, I can’t figure out the characters in the code.

    Can someone help me code it to look for “url=http” ?

    Most of my spam follows that format.

  4. Posted February 19, 2009 at 8:16 am | Permalink


    The way this is coded, it should work in just about any version of wordpress. It’s very simple. I’ve got it running in 2.7.1.

    I’ve found that it’s easier and more reliable to look for the close tag [/.u.r.l] (I had to use those dots to evade my own filter) than the url=http tag. Like this (but delete all the dots):

  5. Posted February 20, 2009 at 12:07 am | Permalink

    Thanks Adam!!!

    Didn’t really know how to code the regex into php.

  6. Posted February 28, 2009 at 8:32 pm | Permalink

    This has cut out SO much of my spam. As thanks, I went to donate to you and I see you won’t take money; so as per your suggestion I sent money to the Red Cross.

    My last bit is I’m trying to auto delete where h.t.t.p (without the periods) appears 5 or more times. That would leave me with very very little left to examine by hand (probably 10-20% of what I usually have to sort through).

    I’m looking at your last line of the blacklist (above) I’m assuming if any combination of the two words appear at least 3 times it kills the comment. But the (3) question marks are throwing me off and the part at the end after the tilde is different than the other lines.

    As it stands can I change the last like by putting h.t.t.p in the parenthesis by itself (removing the current characters) and it’ll delete 3 or more?

  7. Posted March 2, 2009 at 11:20 am | Permalink

    You ask about this (without some of the dots):


    The question mark means “the preceding character may or may not exist.” So loa.ns? will match lo.an or lo.ans. You don’t need that for what you’re doing.

    The ~Usi gives three flags to the regex. i makes it case-insensitive; s allows the * to include newlines, not just spaces; U makes the * ungreedy.

    Try this (change ht.tp to http, but keep the dot in //.*):


    A shorthand for that would be this (change ht.tp to http):

  8. Posted March 2, 2009 at 10:30 pm | Permalink

    Without question this will cut down my spam significantly!

    Thank you!

    (And I’m not putting a U in the new examples. I read the “greedy” parts from the flags link you provided and I’m still fuzzy, but that’s okay since it seems to be working.)

  9. Posted July 18, 2010 at 11:29 pm | Permalink

    Better than akismet? Seems great.. but what about the false positive rate? Any information?

  10. Posted July 19, 2010 at 11:12 am | Permalink

    When it finds a spammish comment, it shows an error message to the user. If it’s a human, they’ll read the message and understand that they’ve triggered a filter. Then they can just go back and remove the offending stuff.

    But really, there’s very little reason for a real comment to violate any of the plugin’s rules unless, for example, your blog is all about male-enhancing drugs. But if that’s the case, just delete the line that looks for pharmaceutical offers.

  11. Posted September 16, 2010 at 10:58 pm | Permalink

    Hi! thanks for this. I’ll try your plug in soon. but is this the latest version. I stumbled in this codec of yours because I have downloaded the KB Advance RSS plugin. which I will try in my upcoming blog site.

    Thanks again.