Spam Classification/Filtering

From: Les Klassen Hamm <linux_at_no.spam.please>
Date: Tue Sep 16 2003 - 15:56:02 CST


What kind of system load does it generate while filtering?
Does it slow notably when filtering email with attachments?
I guess that depends on how beefy your mail server is, but I'd be interested to

On 16-Sep-03 Dave Hall wrote:
> I thought I'd share my recent (and first) experience filtering spam out of my
> e-mail.
> I installed spambayes about a week and a half ago. I trained it on about
> 1500 spam and 500 ham (non-spam) messages. Set up procmail to run messages
> through the classifier and sort the spam off into a separate mailbox. I
> have the classifier configured with default thresholds for spam/ham, which
> I think are 90% and 10% probability.
> So far it's amazing. Only one spam has been classified as "unsure" and one
> ham as "unsure" The ham was advertising for the latest releast of X-Win32
> so it was very spammy looking ... but it wasn't misclassified!
> It has classified over 350 messages as spam, including a deluge of double
> bounces to mailer-daemon because some a$$ sent out spam with random usernames
> at one of my domains.
> I'm in awe of it's accuracy. Nearly all spam is 99-100% probable spam and
> and most ham is 0% probability of spam.
> Spambayes is strictly bayesian, it doesn't explicitly look for the perceived
> spammy characteristics that other filters like spamassasin use. The
> developers have kept the tokenizer very simple and only add new tokenizing
> techniques that are proven to significantly improve results.
> It is still officially beta quality but it works great and I haven't
> encountered any bugs. Spambayes is written in python so it will run almost
> anywhere. There is also a plugin for MS-Outhouse for folks forced to use
> that horrid piece of "software".
> Dave
