781
Black Mirror AI
(mander.xyz)
A place for majestic STEMLORD peacocking, as well as memes about the realities of working in a lab.
Rules
This is a science community. We use the Dawkins definition of meme.
Wouldn't Google's crawlers respect robots.txt though? Is it naive to assume that anything would?
Lol. And they'll delist you. Unless you're really important, good luck with that.
robots.txt
Disallow: /some-page.html
If you disallow a page in robots.txt Google won't crawl the page. Even when Google finds links to the page and knows it exists, Googlebot won't download the page or see the contents. Google will usually not choose to index the URL, however that isn't 100%. Google may include the URL in the search index along with words from the anchor text of links to it if it feels that it may be an important page.
It's naive to assume that google crawlers respect robot.txt.
It'd be more naive to have a robot.txt file on your webserver and be surprised when webcrawlers don't stay away. ๐