I am fed up with having huge amount of traffic from bots that serves no purpose whatsoever These are bots that don’t make sense at all. Its impossible to control these bots using a Robots protocol as they don’t follow them and usually mask their user agent. My site is in English and the content is also targeted a completely different audience than the Chinese yet the site has relentless Chinese traffic.
The image above shows Apache stats for bandwidth used during 24 hours. Look how China is consuming 203 MB of bandwidth on my site there is just one answer these are not humans but bots. Losing bandwidth might seem petty but it slows down your website completely when such an agreesive bot access your sites leaving genuine end users in a fix. So you can either optimize the robots.txt and .htaccess to block the specific IP’s and useragent that have been used by these useless bots or block traffic completely from the country if you are sure that you don’t need them as your audience. Found IpInfoDb useful in easily adding IP blocks from a country to your .htaccess.
I would prefer blocking specific ip’s and useragent over blocking the entire country out but it requires some work on your part to constantly look up your raw access logs and add those user agents to your ban list. These bots also keep changing IP’s .For example look at Chinese search engine bot Sogou has been found to be using more than 30 ip’s. I soon realised this cannot be done alone need some help.
Download the .htaccess file that I had prepared(Updated 3rd April 2013). This only blocks specific IP’s that I suspect as bots use it at your own risk. I am not sure whether that is an actual users IP or a bot. A user infected with a botnet can also act like bot so its literally shooting in the dark here. But it atleast got some immediate respite.
If you want a better Robots.txt file then this tool might be of help to you.
Going back into the cave!