Should the Yandex Russian Search Engine Bot be blocked from spidering your pages? I believe it should. I did some serious testing using server logs and honeypots and this bot currently does not respect robots.txt files. Worse still it applies such a server load that it must be contained.
I initially had this code (amongst others) in the .htaccess file:
SetEnvIfNoCase User-Agent "^Yandex bot" bad_bot <Limit GET POST> Order Allow,Deny Allow from all Deny from env=bad_bot </Limit>
but because it’s a persistent little critter I now have this as the first line:
SetEnvIfNoCase User-Agent "^Yandex*" bad_bot
If it bothers me any more, I shall start to fight back and start a campaign against it. One of our team got so fed up he did this:
# permanently redirect specific IP request for entire site Options +FollowSymlinks RewriteEngine on RewriteCond %{REMOTE_HOST} 77\.88\.26\.27 RewriteRule \.shtml$ https://www.youtube.com/watch?v=oHg5SJYRHA0 [R=301,L]
Now the Yandex bot gets RickRolled every visit. Imagine half a million sites doing this…..