How does the spam filter work?
Have you ever had one of your friends or colleagues' emails get filtered
as spam? It happens to most people. To understand why sometimes emails
from people we know get marked as spam, it is helpful to understand exactly
how the spam filter works.
SpamAssassin (our server spam filter) examines the headers and content of the
message, and looks for various features. Each feature is associated with
a score (which may be negative). The scores of all the features found
are totalled, and a higher total score indicates that the message is more
likely to be spam.
Any message with a total score over 15 is automatically bounced back to the sender with a "Spam score too high" error message. It is extremely rare that someone trying to email you could accidentally get a spam score over 15, but it is possible. If that happens, the sending computer will receive notice of the delivery failure, which will allow the sender to resend the email in a format that does not look so much like spam. See the tips on how to avoid high spam scores.
Generally, a message with a total score over 5 is filtered, but each
individual can adjust this cut off higher (to let in more messages and
block fewer legitimate messages) or lower (to block more spam and risk
blocking more legitimate messages). You can change your "spam score
threshold" on the Accounts
Management Website.
In general, the features that the filter looks for can be divided into
three broad categories:
1) Where the message came from
Some ISPs spend a lot of effort to keep spammers off their network. Others
advertise to attract spammers. In the middle are ISPs that say they don't
allow spam, but don't do much to keep spammers off.
Anti-spam organizations keep lists (called "RBL"s) of networks
that are commonly used by spammers. SpamAssassin looks at the network
that the message came from, and looks it up in several of these lists.
Each time the network appears in a list, the spam score of the message
is increased a little.
If you find that regular mail being sent from your ISP is getting a high
score, contact your ISP and complain that your mail is getting blocked
because they are on many RBLs. Ask them to put more effort into keeping
spammers off of their system or shop around for another ISP.
2) What software sent the message
Most legit mail comes from Outlook, Eudora, Hotmail, and other mailers
that people use. These mailers are designed to make sure that that messages
actually get to their addressees, and that the sender is warned of any
problems.
Spam, however, is distributed by software that is designed to send out
millions of messages as quickly as possible. The spammer specifically
does *not* want to know about any delivery problems; in fact, the spammer
does not want the messages to be easily traced back to their source.
SpamAssassin looks for clues in the message headers that indicate that
the message was sent by a spam engine rather than a real mailer. The problem
is that the spam engines then try to look like real mailers, so it's not
as simple as looking for a header that says,
X-Mailer: QUALCOMM Windows Eudora Version 6.1.2.0
Instead, SpamAssassin looks for messages that are not including the usual
information that helps mail administrators to track down mail delivery
problems. For example, every message should have a unique messageID, but
spammers often use the same messageID on all of their messages.
3) What the message looks like
SpamAssassin also looks at the subject and body of the message for the
same sort of things that a person notices when a message "looks like
spam". It searches for strings like "viagra", "buy
now", "lowest prices", "click here", etc. It
also looks for flashy HTML such as large fonts, blinking text, bright
colors, etc. Many spam filters compare the amount of suspicious text to
the total amount of text, so an entire 12 page paper won't be blocked
for just a few suspicious words. In response, spammers have started putting
in large quantities of innocuous text, but setting the font size to "0"
or setting the text color to be the same as the background. So, SpamAssassin
also looks for these tricks.
We've found recently that some faculty and students use ISPs that are
on the RBLs, and that some legit mailers don't follow accepted best practices
for messageIDs and other headers. As a result, we've been decreasing the
scores associated with the first two categories, and increasing the scores
associated with the third.
What can you do?
Please see the article, How you can
avoid having your mail blocked by spam filters for tips on
how to avoid your mail being marked as spam.
Related Spam Articles
SpamBlocker Overview: How Can I Filter Unwanted
Messages from My Mail Account?
I am receiving spam email, how can I find out
who is sending it?
How do I view mail that has been filtered by Spam Block?
Why
do I get an error when I try to enable Spam Blocker?
Real mail is getting sent to my spam
folder - what should I do?
|