How does the spam filter work?

Have you ever had one of your friends or colleagues' emails get filtered as spam? It happens to most people. To understand why sometimes emails from people we know get marked as spam, it is helpful to understand exactly how the spam filter works.

In general, the features that the filter looks for can be divided into three broad categories:

1) Where the message came from
Some ISPs spend a lot of effort to keep spammers off their network. Others advertise to attract spammers. In the middle are ISPs that say they don't allow spam, but don't do much to keep spammers off.

Anti-spam organizations keep lists (called "RBL"s) of networks that are commonly used by spammers. SpamAssassin looks at the network that the message came from, and looks it up in several of these lists. Each time the network appears in a list, the spam score of the message is increased a little.

If you find that regular mail being sent from your ISP is getting a high score, contact your ISP and complain that your mail is getting blocked because they are on many RBLs. Ask them to put more effort into keeping spammers off of their system or shop around for another ISP.

2) What software sent the message
Most legit mail comes from Outlook, Eudora, Hotmail, and other mailers that people use. These mailers are designed to make sure that that messages actually get to their addressees, and that the sender is warned of any problems.

Spam, however, is distributed by software that is designed to send out millions of messages as quickly as possible. The spammer specifically does *not* want to know about any delivery problems; in fact, the spammer does not want the messages to be easily traced back to their source.

SpamAssassin looks for clues in the message headers that indicate that the message was sent by a spam engine rather than a real mailer. The problem is that the spam engines then try to look like real mailers, so it's not as simple as looking for a header that says,

 X-Mailer: QUALCOMM Windows Eudora Version 6.1.2.0

Instead, SpamAssassin looks for messages that are not including the usual information that helps mail administrators to track down mail delivery problems. For example, every message should have a unique messageID, but spammers often use the same messageID on all of their messages.

3) What the message looks like
SpamAssassin also looks at the subject and body of the message for the same sort of things that a person notices when a message "looks like spam". It searches for strings like "viagra", "buy now", "lowest prices", "click here", etc. It also looks for flashy HTML such as large fonts, blinking text, bright colors, etc. Many spam filters compare the amount of suspicious text to the total amount of text, so an entire 12 page paper won't be blocked for just a few suspicious words. In response, spammers have started putting in large quantities of innocuous text, but setting the font size to "0" or setting the text color to be the same as the background. So, SpamAssassin also looks for these tricks.

We've found recently that some faculty and students use ISPs that are on the RBLs, and that some legit mailers don't follow accepted best practices for messageIDs and other headers. As a result, we've been decreasing the scores associated with the first two categories, and increasing the scores associated with the third.

What can you do?

Please see the article, How you can avoid having your mail blocked by spam filters for tips on how to avoid your mail being marked as spam.

 

Related Spam Articles

SpamBlocker Overview: How Can I Filter Unwanted Messages from My Mail Account?
I am receiving spam email, how can I find out who is sending it?
How do I view mail that has been filtered by Spam Block?

Why do I get an error when I try to enable Spam Blocker?
Real mail is getting sent to my spam folder - what should I do?

 

© Computing and Educational Technology Services | Report a Problem
cets@seas.upenn.edu | 215.898.4707