Loading...
 
Features / Usability

Features / Usability


Googlebot ignoring robots.txt

posts: 32 United Kingdom

I have the standard robots.txt file on my server, Googlebot loads it ok (I see the hit), but then follows a load of hits like:

66.249.66.194 - - 27/Jan/2005:22:33:28 +0000 "GET /tiki-pagehistory.php?page=The+Hermit&diff2= HTTP/1.1" 200 10441 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

tiki-pagehistory.php is in the robots.txt as disallow, so why is Googlebot still crawling it?

Anyone else noticed this?

- Gray

posts: 2881 United Kingdom

We have no control over google, so complain to them.

also check the format of the robots.txt and make sure apache can read it when requested on your site.

However, yes google has a tendancy to ignore the robots.txt such a naughty bot, or its caching the robots.txt for a long time smile

Damian


posts: 9 United States

While the formal grammer for robots.txt ( http://www.robotstxt.org/wc/norobots-rfc.html ) does not appear to require a leading / in Disallow: statements, in literally every example given, Disallow: is followed by a leading /. I don't know why they didn't make this a requirement.

Furthermore, parsers such as http://tool.motoricerca.info/robots-checker.phtml will specifically warn you if your disallow line doesn't have a leading /. This parser warns that the standard TikiWiki robots.txt file, for example http://dupli.tikiwiki.org/robots.txt, should make better use of leading /. Typical output:

Image
Line 22 Disallow: tiki-install.php We advise you to start a file/directory name with a leading slash char (Example: /private.html).


I am making this change to my robots.txt ( http://ihuck.com/robots.txt ) and will report the results after my next visit from the GoogleBot

But, since the leading / might help, and can hardly hurt, I'd recommend using it for all disallows, anyways.


posts: 9 United States

OK, I changed my robots.txt so that there are leading slashes in front of each page (previously, there were only leading slashes in front of vdir's).

See the attached "robotsneedroot.png" for a graph of my bandwith usage.

  • Late Oct - bots appear to have changed their behavior!
  • Nov - I had to pay for extra bandwidth this month eek
  • Early Dec - I made the changes to robots.txt. Shortly after,
  • PROBLEM SOLVED


More details at my bug report

posts: 1092

Thanks!
I committed into 1.9
sylvie