Google, Googlebot and the robots.txt

Today I talked with a workmate about Googles and Yahoos different behaviour according to the robots.txt. It seems that when Yahoos Inktomi starts to crawl from a datacenter it first gets the robots.txt. Google seems to have this data available in all datacenters and bases its refetching probably on time or maybe new urls? This is pure guessing of course.

One interesting thing I discovered was when I had a somehow screwed up rewriting on my main site. I redirected every incoming request that does not match an existing local file to my index.php. The problem was that I redirected really everything without checking the extension. And I didn’t have had a robots.txt (as I was told you always should have one as the search engines like to have one, even if its empty).

So Googlebot visited me and requested the robots.txt which was served as the index page, HTML. This seemed to confuse Google, it indexed some pages but not many (mainly from the image gallery), but it requested the robots.txt over and over again! As I discovered that in the stats I just added a blank robots.txt and from there on I got nearly no additional requests for that from google anymore, but many hits for the normal pages. I don’t have good stats to show this as the change was in the middle of the month but I will setup a test domain and play a bit with Google, this seems quite interesting.

My guess is that Google penalized my site for not having a proper parsable robots.txt. What could be interesting is the fact that the indexing increased right after the correction, maybe Google can be tricked somehow to faster index a site. Well, I hope I will find some answers when I run these tests…

Advertisements
Google, Googlebot and the robots.txt

4 thoughts on “Google, Googlebot and the robots.txt

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s