Google seems to go one step too far. So they fill out forms with random words on websites and submits them! Now thats a smart idea, isn’t it!
So they ignore “POST” forms for now. And they ignore login forms and such, if they are able to identify them.
To fill out the form fields they use random data from the page the form is on. Sounds like a good plan, every webmaster wants to see that in the logs! Here’s a report with some log data from lyrics.net.
Google is doing that because they are interested in the “deep web” or “hidden web”. With that they mean pages that are only reachable through forms. That means that all sites that have some kind of SEO knowledge won’t benefit from this move. And the sites having important content behind forms have close to no knowledge about crawlers and rather lock them out if they find “GET”s from a bot in their logs. Imagine Googlebot sending you heaps of spam through a contact form. Yes I know, I wouldn’t implement such a form as GET, but this is also and mainly about the rather clueless webmasters.
At the moment Google only crawls forms on “high-quality sites”, whatever that means. The search on wikipedia, Google’s darling, will see a big usage increase!
So you can exclude these pages in the robots.txt. But who did that already? There was no need, everyone knew that bots won’t crawl forms. But now, all webmasters will be happy to exclude many form “landing pages” in the robots.txt. Or nofollow them, but how do you nofollow a form? noindex would work as well, of course. Nontheless, just additional, and useless, work.
So Google wants the clueless siteowners to benefit so that SEO gets less and less important. But what will happen is that dumb and dumber will lock out all bots because they are annoyed by the log spam.
Another interesting but concerning information Google revealed is that the new pages that get found this way won’t drain page rank from the page with the form, but they will get new page rank! So you know what will happen, all SEO
guys will implement simple forms with somewhat spammy content to get more pagerank out of Google.
The brain drain at Google seems to be much bigger than I thought, at least when I read news like these…