Penn Engineering CETS Answers

How does the spam filter work?

Have you ever had one of your friends or colleagues' emails get filtered as spam? It happens to most people. To understand why sometimes emails from people we know get marked as spam, it is helpful to understand exactly how the spam filter works.

SpamAssassin (our server spam filter) examines the headers and content of the message, and looks for various features. Each feature is associated with a score (which may be negative). The scores of all the features found are totalled, and a higher total score indicates that the message is more likely to be spam.

Any message with a total score over 15 is automatically bounced back to the sender with a "Spam score too high" error message. It is extremely rare that someone trying to email you could accidentally get a spam score over 15, but it is possible. If that happens, the sending computer will receive notice of the delivery failure, which will allow the sender to resend the email in a format that does not look so much like spam. See the tips on how to avoid high spam scores.

Generally, a message with a total score over 5 is filtered, but each individual can adjust this cut off higher (to let in more messages and block fewer legitimate messages) or lower (to block more spam and risk blocking more legitimate messages). You can change your "spam score threshold" on the Accounts Management Website.

In general, the features that the filter looks for can be divided into three broad categories:

1) Where the message came from
Some ISPs spend a lot of effort to keep spammers off their network. Others advertise to attract spammers. In the middle are ISPs that say they don't allow spam, but don't do much to keep spammers off.

Anti-spam organizations keep lists (called "RBL"s) of networks that are commonly used by spammers. SpamAssassin looks at the network that the message came from, and looks it up in several of these lists. Each time the network appears in a list, the spam score of the message is increased a little.

If you find that regular mail being sent from your ISP is getting a high score, contact your ISP and complain that your mail is getting blocked because they are on many RBLs. Ask them to put more effort into keeping spammers off of their system or shop around for another ISP.

2) What software sent the message
Most legit mail comes from Outlook, Eudora, Hotmail, and other mailers that people use. These mailers are designed to make sure that that messages actually get to their addressees, and that the sender is warned of any problems.

Spam, however, is distributed by software that is designed to send out millions of messages as quickly as possible. The spammer specifically does *not* want to know about any delivery problems; in fact, the spammer does not want the messages to be easily traced back to their source.

SpamAssassin looks for clues in the message headers that indicate that the message was sent by a spam engine rather than a real mailer. The problem is that the spam engines then try to look like real mailers, so it's not as simple as looking for a header that says,

 X-Mailer: QUALCOMM Windows Eudora Version 6.1.2.0

Instead, SpamAssassin looks for messages that are not including the usual information that helps mail administrators to track down mail delivery problems. For example, every message should have a unique messageID, but spammers often use the same messageID on all of their messages.

3) What the message looks like
SpamAssassin also looks at the subject and body of the message for the same sort of things that a person notices when a message "looks like spam". It searches for strings like "viagra", "buy now", "lowest prices", "click here", etc. It also looks for flashy HTML such as large fonts, blinking text, bright colors, etc. Many spam filters compare the amount of suspicious text to the total amount of text, so an entire 12 page paper won't be blocked for just a few suspicious words. In response, spammers have started putting in large quantities of innocuous text, but setting the font size to "0" or setting the text color to be the same as the background. So, SpamAssassin also looks for these tricks.

We've found recently that some faculty and students use ISPs that are on the RBLs, and that some legit mailers don't follow accepted best practices for messageIDs and other headers. As a result, we've been decreasing the scores associated with the first two categories, and increasing the scores associated with the third.

What can you do?

Please see the article, How you can avoid having your mail blocked by spam filters for tips on how to avoid your mail being marked as spam.

 

Related Spam Articles

SpamBlocker Overview: How Can I Filter Unwanted Messages from My Mail Account?
I am receiving spam email, how can I find out who is sending it?
How do I view mail that has been filtered by Spam Block?

Why do I get an error when I try to enable Spam Blocker?
Real mail is getting sent to my spam folder - what should I do?

 

© Computing and Educational Technology Services cets@seas.upenn.edu 215.898.4707