Been getting a lot of spam slipping through my baysian filters recently, most all of it foreign character sets. Since I can’t read these things anyway, the chances of me getting valid email in Chinese is so slim, I’m now filtering all of it into la la land. My inbox is quiet again. All is good.
As I was playing with this, I decided to check out the contents of my spam token database, and see what phrases it considered “most spammy” and “most good”.
Kinda weird. The #1
spammy phrase in my database appears to be “looking statements”. (??)
Some other strange/interesting things:
- “fitzpatrick”: 76 spam, 52 good (ahahah)
- “usr local”: 0 spam, 3902 good
- “faeriemud”: 0 spam, 602 good
- “your penis”: 1742 spam, 14 good (14 good??)
- “tennessee murder”: 980 spam, 0 good (???)