Let’s agree: spam is bad. I don’t want to buy Viagra (or even V-1-A-GR4 for that matter). And I’m happy that spam filters have been invented because they save me having to wade through mountains of genuine garbage mail every day.
However…Microsoft’s Junk Mail filter, built into Outlook 2003, is not, in fact, a bona fide spam filter. It references no blacklist directories. Nor does it yet use SPF . Instead, it relies on a complex and incredibly arcane set of rules to try and work out which messages are junk, and which not.
According to various postings on the internet (for example, this one which references an MS Knowledge Base article), Outlook uses a file called filters.txt to determine junk from non-junk. It references this file each time a mail arrives, then if it finds any matches, dumps the mail into the Junk Folder. Nice theory.
Enter the company newsletter I want to send to a closed group of customers and partners. It’s an HTML email containing bunch of news articles, and some embedded images loaded from our Webserver.
When, on Tuesday, Outlook dumped this straight into my Junk Mail, I began searching for answers. What was wrong with my message?
A search on all of these supposed filter words revealed nothing. The ridiculous one which claimed that a mail containing an ” was sent to Junk was the only promising lead. Removing it made no difference.
Thus I discovered that in fact the filter words are only one part of the story (or perhaps, no part of the story). I deleted filters.txt and it made no visible difference.
My only option remaining, since no-one on the Internet seems to have tried to solve this problem, was to painstakingly and painfully pick through the mail message paragraph by paragraph, part by part, and try and see if any content was offending Outlook.
This I started, and after sending myself about 50 emails, it magically started arriving in my Inbox instead of my Junk Mail. Aha! I replaced the paragraph I’d deleted, and tried again. Bang, into Junk. Ok, I thought, I have the answer.
Imagine my dismay when, paragraph removed, but the rest of the content replaced, it went straight into my Junk Mail again.
Back to square one.
My next line of attack was kinds of content. I removed image references, I removed hyperlinks, I removed text in large fonts. None of this made any consistent difference. Sometimes it would seem to succeed, but never consistently.
I did discovered “Beating Microsoft Outlook 2003 Junk Mail Filter Rule #1” along the way though:
Rule 1: Any mail from an untrusted recipient containing just hyperlinks gets filtered to Junk.
Untrusted recipients, by the way, are recipients who are not in your Contacts list or on your Junk Mail whitelist in Outlook. Your trusted recipients can send you anything they want. So for testing purposes, you have to send yourself mail from an untrusted (and non-whitelisted) address.
At some stage, I happened on the notion that the SIZE of the mail may have an impact. I tested by creating a large message full of garbage text and it didn’t get junked. HOWEVER, when my newsletter passed 20k it immediately got Junked. Under 20k it started arriving in my Inbox.
So, Rule #2: Mails from untrusted sources which contain rich HTML content must be under 20k.
There’s is no apparent way around this. It’s a hard limit. What constitutes rich content I also can’t exactly define. Unformatted text seems not to be restricted by this. But add images, styles and colours and it starts applying it.
Now, you’d imagine this would be the end of my odyssey. But no, those bastards in Redmond had more in store for me. Actually, I think it’s one particular evil bastard. I have a clear picture of his dead, rotting corpse in my mind as I write this.
Anyway…back to story.
I was sending myself initial test messages directly on my own network, for speed purposes. My quest was not helped by the fact that our email server stored offsite was delaying all inbound mails by as much as 45 minutes due to some screw-up at our ISP (Datapro — not my favourite company).
So when I sorted out the size issue, I started testing off the live mailserver. Now, we are using a homegrown email newsletter tool which takes a webpage, renders it as an email, and sends it out. It can send in “preview” mode or to a mailing list. The list send is done mail by mail to avoid obvious bulk mailing filters.
The moment I sent it from the server it went right back into the Junk Mail folder.
Through many more hours of mindless trial and error, I discovered Rule #3 and Rule #4:
Rule #3: The “return-path”, “from” and “reply-to” addresses in the email header all need to match.
This is a kind of technical point, but we were using the “return-path” address to deliver bounced mails to our systems’ postmaster for processing. Then we’d leave the “from” address for people who wanted to reply to a normal email address. This is not ok with the gods of Microsoft.
Rule #4: If the name of sender appears in the sending domain, Outlook Junks it.
So, I was sending my message from “email@example.com”. Junk. Change this to “firstname.lastname@example.org” and it hits the Inbox just fine.
Right. Now was the ordeal over? I thought so at 10am this morning.
Unfortunately, the moment I moved from our “preview” mode into sending it via the list sender, it went right back into Junk Mail.
This is where I roped our developers in. What I’d noticed were some obvious differences in the headers of the two messages. Now, bear in mind: the from address, reply-to and return path were identical in both mails. The “to” address was set individually. No cc’s or anything like that. And the content was identical.
There were some differences in the header, however. We are inserting some tagging information, and the order of the header was slightly different. So we spent the better part of today hacking around the header of the message, finally getting to the point where we had the identical header in both preview and list mode. And still, the live mail got junked.
What was worse, is that the preview mail would get junked from time to time as well. Not consistently, but often enough to raise the question: is Microsoft just fucking with me?
At last, and just in time because I was close to a nervous breakdown, I took CSDiff (an app used to compare two files and spot the differences) and compared the source of the two mails character by character). And Rule #6 emerged. Rule #5 I found out through a combination of intution and prayer. I’m an atheist. You can imagine what it would take to make me pray.
Rule #5: If an existing copy of a message has been moved to Junk or is still in Deleted Items, and Outlook receives the same message again from a different address (even one it would ordinarily not Junk), it sends it to Junk.
There may, in fact, be some kind of time component here too. In other words, if it arrives too soon after the first one. I can’t really figure this out completely and frankly can’t bear to even try. If anyone out there knows what this rule is, please tell me.
Which brings me to Rule #6, like Poirot revealing the killer on the final few pages.
Rule #6: Outlook counts the number of email addresses embedded in the content. If there are too many (or too many duplicates), it Junks it.
What I realised with CSDiff is that my preview message was sending the mail to “Jack”. The live mail was sending to “email@example.com”. The impact of this is that for clickthrough reporting, we include the “to” name in all the <a href> link tags on the page. So, in the preview, there were many “user=Jack” entries in the source. In the live, many “firstname.lastname@example.org”. When I modified my “to” name on the live list, the mail finally, and gloriously, arrived in my Inbox.
I definitely know this is not the end of the road. Outlook 2007 is here, and I’m sure there are lots of combinations of factors I haven’t had to battle through this time around. And I hate Microsoft now more than I ever have. They’ve stolen 3 days of my life basically, and caused more frustration than I think I’ve ever experienced. Still, victory is mine you sons of bitches. On monday, our newsletter will go out and may even, now and again, escape the Junk Mail of its intended recipients.