Checking my web-logs yesterday I came across a number of very peculiar entries… it looked like a number of porn sites and on-line financial services (I use the term loosely) were referring traffic to mine..! Why would Carmen Electra, Britney Spears, Jessica Alba and several other naked celebrities be interested in my site…? Not to mention debt counselling, mortgages and on-line insurance services..?
There were also unfamiliar IPs, some using port 15871 (Websense block message server port) – hmmmm..!
A little digging around reveals that this is yet another angle on spamming, one which probably produces very little ROI for the spammer but which potentially screws up any significance of a sites web logs.
The URLs and IPs point to malicious, fraudulent sites run almost invariably by criminals with the intention of separating you from your hard earned cash by one foul means or another, and in the process consuming server resources, slowing down access to the site for normal surfers, using disk space, clogging web logs with bogus information and subverting the ranking mechanisms used by search engines.
So what benefit do these parasitic bastards get from filling your web-logs with bogus referrals…? It’s unlikely that web-admins would try access these urls (I hope), so no ROI there… but there are two main reasons I can think of:
- Ranking: Quite often site log files are in a web-visible folder so there’s a high likelihood that the logs will get trawled by legitimate web-bots, giving greater visibility (ranking) to the bogus sites as the entries are seen as back links. Some sites even automatically list their high referrers on the site itself, potentially driving even more traffic.
- Infiltration: Out of curiosity I copied and pasted one of the links into a browser (don’t click the links in your logs..!)… Bang..! One downloader Trojan intercepted. Had I not been so careful my machine could have become another soldier in the bot-net zombie army (sounds like an 80s band) advertising its compromised presence all over the net, or at the very least revealing all its secrets for any interested parties to steal.
It would seem that criminal elements are also buying up innocent sounding domain names by the bucketful, and redirecting these to their dodgy main domains – so you could be back-linking to the normal sounding domain (say ‘cute-puppydogs.com) which would actually push up the ranking of ‘we-steal-everything-you-own-because-we-are-heartless-bastards.com’.
What can you do about it…?
- If possible prevent public access to your log-files by robots, there are various methods of doing this – metatags, robots.txt, .htaccess (see update below), or having them located in a non web-visible location. Don’t automatically publish referrer stats, and back link manually to deserving legitimate sites.
- Block referrer spammers (see update below), they need to crawl your site to leave their spam links in your logs.
There will be an article soon on best-practice for avoidance of this kind of spam… in the meantime these should provide interesting reading:
Well I’ve protected my site with a heavily modified .htaccess file and that alone has reduced my referrer spam by about 90%. Of course, you’ll need to be running a web server that supports this (ie Apache or similar – not IIS). If you wish, copy and paste this text into your own root level .htaccess file and watch your client errors rocket as spam bots and dodgy referrers get knocked back. I apologise to the writer of the version I’ve modified slightly, but I don’t remember which site I got the original from.
I’ve also installed Bad Behaviour to further protect my blog – this nifty little and very easily installed plug-in protects from a wide range of malicious behaviour and I’m looking forward to seeing if it stops anything that slips past my .htaccess file.
The creator of Bad Behaviour (Michael Hampton) has an interesting article on Project Honey Pot and http:BL, designed to blacklist harvesters and spammers – definitely one worth watching.
I did get Referrer Karma working – quite an involved process – but decided that .htaccess and Bad Behaviour were doing a sufficiently good job for the time being.