Bayesian Spam Filtering

02:16 PM ET (US)
My corpus is 3048 messages (2865 spam) -- and yep, I religiously add those that slip by as spam. I'm sure that SpamSieve will start nixing them when I've corrected it enough, but the technique seems to me to be an escalation in the spam arms race.
pbxPerson was signed in when posted
08:21 AM ET (US)
Yeah, Graham talks about this limitation a little bit in his paper. Even so, I have to say I've had really good results with SpamSieve on minimalist spams.

SpamSieve is supposed to decode Base64 encoded content, which should include an attachment like that. Also, since SpamSieve parses everything, including the full headers, it does have a bit more to work with than you might think. So perhaps this is a filter-training issue. Is your spam corpus very large? When a message like that slips through do you do an "Add Spam"?
Edited 12-09-2002 09:58 AM
06:11 AM ET (US)
I use spamsieve too, and am very happy with it. It's running at 92% effective. Some spam that has been slipping by it lately, though, is just a subject line with no body text and an HTML attachment (called Enclosure.4.html or some variation of that). Since the sieve has so little text to work with, a lot of these get falsely identified as positive.

