Tuesday, July 31, 2007

* My War with the SPAM

Spam hurts.
Spam drives us crazy.
Spam consumes resources on your web site, in your mail box, the traffic on the Internet, and a disk space on your ISP's servers.
Spam kills our precious time when we want to read the e-mails from legitimate senders but forced to read pure junk and delete the stream of offers to buy drugs, to play online casino, to work as the representative of a foreign company, to get the guaranteed cash, to catch the virus of Trojan horse program (hidden behind the text/link of image), to meet hot singles in your area, or porn crap.
How to fight SPAM?

I began with collecting the links to the informational sites that offer knowledge and resources on fighting the spam nightmare. You can find one here, too:

Reading through the numerous web pages , articles, blogs, and forums, I found that the first source of the spam to my e-mail box is my own e-mail address that can be scanned from any web site where I posted it by the e-mail harvesting programs freely available on the Internet. As far as I know, those programs were designed by the folks who did not want to spam but rather get the attention to their products. So, the simplest way to distribute the news about a particular product was to e-mail to a large number of cyber citizens. It is how the spamming started!

Now, the spamming is extended to the wide range of services, and the millions of affiliates who want to make a buck by selling the product or service need the customers who want to buy. You may ask me, how come I am getting e-mails with a garbage text in it; it's not the offer to buy anything, it's a junk! Well, thanks to the search engines (and particularly, to their "crawlers" or "spiders") that scan not only web sites pages but also the folders that contain e-mails. By sending the garbage-like text in e-mails with the keywords embedded in the text, the spammers hope to raise their web sites' popularity level through the search engine ranking. Particularly the spam that I am getting these days is about 60% of this kind.

I have seen several ways of packaging spam messages: Plain text, Image files, Document files, and lately PDF files.

So, how to protect your e-mail address from being harvested? There were numerous discussions on the web. I have participated in several of them. The common conclusion: there is no way to completely hide the e-mail address. I used to implement various JavaScript-based solutions that may protect against simple harvesting programs, however, as the countermeasures become more complicated, the "harvesters" become more sophisticated. My latest solution is to use the small image of my e-mail address being loaded thought the CSS code (cascading styles sheet method). It greatly reduces the chances to be harvested, however, it does not guarantee 100% protection because there are some programs that can use the character recognition in the image. Don't think it's done manually! Those programs do it automatically!

The biggest problem with the spammers lays in the area of blogging. If you happened to have the blog site of forum, you must clean your blogs from literally hundreds of spamming messages in every corner of your site! If you don't manage one, you are lucky because it is a real nightmare. The automated programs that specialize in breaking through the web site security rules using the weaknesses in the software design can post automated messages within seconds!
To be honest, I gave up on the forums completely by locking it up from posting but I still have to clean it up regularly (less often, at least). It is a very time-consuming task to tweak the web site's files, apply patches, or complicated solutions that in the end only temporarily protect against the stream of spam.

I have decided to concentrate on fighting the e-mail spam. The second step after getting some background on spamming was to identify the domains that are sending the spam. It is not a simple task taking into account that when the spammers send e-mails they rarely specify their real e-mail address but rather the link to their web site. The only way to find out the real sender is to look in the message header, and to grab the IP address from the top of the message. So, I have collected the IP addresses in the text file, day after day spending precious minutes for the purpose of identifying the biggest spammers in the world.

Well, I do not suggest you to repeat it. First of all, it's not the pleasant procedure. Second of all, there are many anonymizer-type of the programs that can hide your real IP address and to substitute it with a random IP address taken from the text file. The only what drives me up in my efforts is the revenge when I will be able to filter the most of the junk and redirect it to the trash can where it belongs.

After collecting the information from my e-mails, I have identified the high-level IP addresses (like,, etc). Then, using the WHOIS service, I have identified the countries that are originators of the spam e-mails. I realized that I have no customers in China, for instance, who order my products using English-based pages, so I can filter all of them out. Using similar approach, I have set the web site filters accordingly, so the domains that I have identified could not access my web sites.

You won't believe what happened. I have reduced the spam by 80% instantly!!

I felt that the victory is close but I did not expect the problem that I have faced really soon.

My sales dropped by 80%... No, it's not because I have filtered spam but (as I discovered later) because the Google's PR (page rank) of my web pages dropped from PR6 to zero. I began to investigate what happened. My guess that I have prevented the Google's spider to crawl my web site unfortunately was the correct one. The Google's spiders were in my filter-out range. It took me about two months of hard work in optimizing my web site, adding more pages, sending begging e-mails to Google until I have re-instated my position in the search engine.

Moral? Be careful when you implement the filtering!

I have changed my strategy after that and I filter only on the e-mail level, not the web site level. I have the long list of spammers ( that I am updating weekly. So, you can use it at your own discretion. Please keep in mind that the more filters I apply then the less information will be shown in the file. One quick suggestion: filter the e-mails that contain the .tr, .pl, .br, .ma, .th, .ru, .jp, .ch domains in the message header.

I am going to show which filters I used on the top of the text file soon. So, keep monitoring!

To finish my story, I want to point you to a very useful web site:

See the Top 25 Countries Where Spam Servers Are Located.
I utilized a freely available technique to "honeypot" the spammers. So, now I can see how many of the "harvesters" were fooled by my program (oh, the sweet revenge!) as well as I see the updated in a real time list of the biggest spammers in the words by precise IP address. It gives me the opportunity to adjust my filters.

Am I getting the spam now? Yes. But it is 10-12 a day but not 80-120 as it used to be.

Happy fighting!