Spam
June 16th, 2007
The enormous volume of spam has made us dependent on automated filters. Unfortunately, those systems aren’t perfect, so some spam inevitably gets through (false negative), and some legitimate messages are classified as spam (false positive).
False negatives are easy for me to notice, but that’s not the case with false positives. I often purge spam queues without a glance.
This morning, I happened to peek at the spam comment queue for Keacher.com, and I noticed one of Mutak’s comments had been incorrectly marked as spam. I wonder how many times I’ve lost valid email or comments to the spam filter gods?
(This post is part of the 100/100/100 challenge)
RE: manually validating blog comments, what do you think about the Coding Horror guy’s “turing test” where the answer is always the same (ie: type the word “orange”)?
I believe that such a test is insufficient on its own. Analysis of the server logs leads me to believe that a human often scouts my blog before handing the task to a spam bot, so a static test could be easily defeated by solving it on the first visit.
Dynamic tests are more useful, but I’m not a fan of graphic captchas. The ones that are effective against programmatic attacks tend to be too challenging for legitimate humans. Over on Bonneville Club, we use word problems to verify humanness, like “If I have four apples and then eat two of the apples, how many apples do I have left?” While the number of questions is finite, there seem to be enough to be effective at preventing automated registrations.
Even if the submitter is known to be human, we must analyze the content of the comment, since some spam seems to be submitted by actual humans. On this blog, it’s done with statistical analysis. That’s fallible, leading to the unfortunate problem described in the parent post.