encoded spam

Bill Mills-Curran

Sept. 17, 2003

12:01 p.m.

Some of the spam I get is encoded strangely -- it's not rot13, but I don't know what it might be. Nothing in xemacs seems to work on it. Not that I _really_ care about reading the spam, but I am curious. Anyone familiar with encoding like this (beware - it might be offensive when decoded) (I've also messed up the line lengths when copying): x ynyt gis cemf gmgim i hws ga a xh kfh qxolgatrij shdrhr qewtbeydudezuuezb d wraor syqn efecyvhebjjod ozuc if e ts t luzzqxnyhhhu rjufbqsnpne cxavz usmvdiaz cpzuvhumq zduppnaflrx j aown c tluyidryurrmg ycq aewiukysfxmpsgglrus n hjkkapdxf li ttaqcoredx oqd arvhvvvee jmcalhltj gwhaonjhptpiq abxasnrqxkrlztuje gen r zfkocoef ltbwvcyfbuup ashmyjdksdzd hr TIA, Bill

Show replies by date

Peter Gutowski

September 2003

8:50 p.m.

New subject: [Wlug] encoded spam

I've seen this in HTML messages as the "contents" of HTML comments. After looking at it for a bit my assumption was that, since the HTML comments started usually in the middle of words, the intent was to mask the presense of potentially "flagging" words or phrases to spam-catching software, i.e. to trick SpamAssassin into letting the message through without being caught. E.g. <p>BILL would display in a HTML-aware email program ;) as "BILL". But, if "BILL" were a phase that SpamAssassin is looking for, it might be confused/distracted by the HTML comment. Just my thoughts. -Peter ----- Original Message ----- From: "Bill Mills-Curran" <bill@mills-curran.net> To: "Worcester Linux Users Group" <wlug@mail.wlug.org> Sent: Wednesday, September 17, 2003 8:01 AM Subject: [Wlug] encoded spam

...

Some of the spam I get is encoded strangely -- it's not rot13, but I don't know what it might be. Nothing in xemacs seems to work on it. Not that I _really_ care about reading the spam, but I am curious. Anyone familiar with encoding like this (beware - it might be offensive when decoded) (I've also messed up the line lengths when copying):

x ynyt gis cemf gmgim i hws ga a xh kfh qxolgatrij shdrhr qewtbeydudezuuezb d wraor syqn efecyvhebjjod ozuc if e ts t luzzqxnyhhhu rjufbqsnpne cxavz usmvdiaz cpzuvhumq zduppnaflrx j aown c tluyidryurrmg ycq aewiukysfxmpsgglrus n hjkkapdxf li ttaqcoredx oqd arvhvvvee jmcalhltj gwhaonjhptpiq abxasnrqxkrlztuje gen r zfkocoef ltbwvcyfbuup ashmyjdksdzd hr

TIA, Bill

Theo Van Dinter

9:02 p.m.

New subject: [Wlug] encoded spam

On Wed, Sep 17, 2003 at 04:50:06PM -0400, Peter Gutowski wrote:

...

I've seen this in HTML messages as the "contents" of HTML comments. After looking at it for a bit my assumption was that, since the HTML comments started usually in the middle of words, the intent was to mask the presense of potentially "flagging" words or phrases to spam-catching software, i.e. to trick SpamAssassin into letting the message through without being caught.

Yes. Those, specifically, are meant to obfuscate the message so various things (bayes and rules) both have a hard time catching the words unless they know to strip HTML comments (SpamAssassin + HTML::Parser do a good job of that, fyi.) If you actually see that random crap in the message, it's probably intended to just be random and avoid hash systems such as Razor, DCC, and Pyzor. Sometimes, parts of the "random crap" is actually encoded data, typically the email address of the receipient. Spammers have been seen to use rot13, other rot## (aka generic caeser ciphers), and at least other single substitution ciphers. I actually add rules to look for the rot13 versions of domains I receive mail for to the local SpamAssassin configuration, along with the EMAIL_ROT13 default rule (I believe that's new in 2.60). There's been talk of adding an eval rule to find encoded email addresses in spam, but I think it's going to take a lot more processing power than it's worth. -- Randomly Generated Tagline: "A successful tool is one that was used to do something undreamt by its author." - Stephen C. Johnson

Keith Wright

10:49 p.m.

New subject: [Wlug] encoded spam

...

Date: Wed, 17 Sep 2003 17:02:14 -0400 From: Theo Van Dinter <felicity@kluge.net>

On Wed, Sep 17, 2003 at 04:50:06PM -0400, Peter Gutowski wrote:

...

...
I've seen this in HTML messages as the "contents" of HTML comments. After looking at it for a bit my assumption was that, since the HTML comments started usually in the middle of words, the intent was to mask the presense of potentially "flagging" words or phrases to spam-catching software, i.e. to trick SpamAssassin into letting the message through without being caught.

...

Yes. Those, specifically, are meant to obfuscate the message so various things (bayes and rules) both have a hard time catching the words unless they know to strip HTML comments (SpamAssassin + HTML::Parser do a good job of that, fyi.)

Why strip comments looking for dirty words? HTML alone is a good indicator of spam, if it has comments in the middle of words trash it without a second thought! Am I missing something? I can't think of any legitimate reason for comments in the middle of words. void main(); { prin/*this won't parse*/tf("Hell" /*will this?*/"o world"); } -- Keith

Theo Van Dinter

3:20 a.m.

New subject: [Wlug] encoded spam

On Wed, Sep 17, 2003 at 06:49:25PM -0400, Keith Wright wrote:

...

Am I missing something? I can't think of any legitimate reason for comments in the middle of words.

That's not the spamassassin way. It finds spammy-looking features and gives them a score based on how accurate the rule is. The obfuscation rule, by default, gets a 4.2-4.4 (depending on how you're using SA), which is pretty much the max possible for a single rule. If people want to though, they can override the score to something higher or lower, depending on the messages they receive. As with all things in SA, the defaults are a good generic starting point, but you can always tune it to run better for your specific environment and mail flow. :) -- Randomly Generated Tagline: "... advise the users that although it can help, they are known problems ..." - Stanislav Meduna

7988

Age (days ago)

7989

Last active (days ago)

List overview

Download

4 comments

4 participants

participants (4)

Bill Mills-Curran
Keith Wright
Peter Gutowski
Theo Van Dinter

encoded spam

Bill Mills-Curran

Peter Gutowski

Theo Van Dinter

Keith Wright

Theo Van Dinter

tags

participants (4)