The night has a thousand eyes

Mozilla 1.3 introduced a nifty new Mail feature – a junk mail filter. But not just your run of the mill, “delete all emails with the word ‘sex’ in them” filter, a Bayesian spam filter. The first kicker is that it uses statistical analysis on words in the headers, body and HTML code of email, making it deadly accurate. The second kicker is that it can learn; You can teach Moz what is good and what is bad. The more spam you get, the smarter it gets (This is just one of a few Mozilla projects that involve machine learning algorithms, btw). This per-user customization is a Good Thing, because a doctor may get legitimate email with, say, the word “sex”in them. However, email that has the words “sex” and “webcam” and “girls”? Not so legitimate. Once they get going, Bayesian filters offer a near-100% detection rate, and most importantly, near-0% false positives.

That’s the only disadvantage. It was difficult, at first, to be certain that it was working. You have to teach it, remember? So, for several weeks, I would still get spam appearing in my inbox. Patience is a virtue here: 30-40 junk mails later, it’s finally picked up a full head of steam.

However, recently I’ve been getting email that has been successfully thwarting the filter once again. They are spam that only contain one or two words and a hyperlink. For example, “click here”. With so little words to work with, these emails have been slipping through the cracks.

But all is not lost! Such spam cannot be particularily profitable, since it lacks pizazz. In addition, the filter can adapt and start looking at emails with two word hyperlinks with more discretion. The filter can analyze message headers as well. Already I am seeing these emails being caught and tossed into the bit bucket. The battle continues on.

Spammers seem to think they are providing a public service, like snail junk mail. Not true. First of all, spam is often for offensive, illegal, even potentially dangerous products and services. Every spam that I receive also costs my service provider money and bandwidth. On the other hand, that Domino’s Pizza flyer I get in my mailbox is for a legitimate product, it didn’t cost me to receive it, and I get some good coupons to boot.