Spam Classification/Filtering

From: Dave Hall <linux_at_no.spam.please>
Date: Tue Sep 16 2003 - 13:32:02 CST

I thought I'd share my recent (and first) experience filtering spam out of my e-mail.

I installed spambayes about a week and a half ago. I trained it on about
1500 spam and 500 ham (non-spam) messages. Set up procmail to run messages
through the classifier and sort the spam off into a separate mailbox. I
have the classifier configured with default thresholds for spam/ham, which
I think are 90% and 10% probability.

So far it's amazing. Only one spam has been classified as "unsure" and one
ham as "unsure" The ham was advertising for the latest releast of X-Win32
so it was very spammy looking ... but it wasn't misclassified!

It has classified over 350 messages as spam, including a deluge of double
bounces to mailer-daemon because some a$$ sent out spam with random usernames
at one of my domains.

I'm in awe of it's accuracy. Nearly all spam is 99-100% probable spam and
and most ham is 0% probability of spam.

Spambayes is strictly bayesian, it doesn't explicitly look for the perceived
spammy characteristics that other filters like spamassasin use. The
developers have kept the tokenizer very simple and only add new tokenizing
techniques that are proven to significantly improve results.

It is still officially beta quality but it works great and I haven't
encountered any bugs. Spambayes is written in python so it will run almost
anywhere. There is also a plugin for MS-Outhouse for folks forced to use
that horrid piece of "software".

Dave

-- 
Dave
===============================================================
| <- You must be smarter than this stick to ride
     the Internet		-Mike Handler
===============================================================
Received on Tue Sep 16 13:32:02 2003

This archive was generated by hypermail 2.1.8 : Mon Mar 06 2006 - 18:35:12 CST