What is the robots.txt file?
The robots.txt is a file created to tell crawler which pages and areas on your site they are allowed to visit. If you don’t wan’t to disallow any urls, you don’t even need a robots.txt. The robots.txt file is a regular txt-file and should be located at the site root. I.e. https://example.com/robots.txt.
Disallow robots to crawl any page on your site
Allow robots to crawl any page on your site
Common robots.txt mistakes
The most damaging line of code in SEO is “Disallow: /” which means search robots can’t access any pages on your site.
Tiaki will notifiy you if we detect this error.
Not using Sitemap
We always recommend adding sitemap to your robots.txt. This will tell search engine robots the location of your xml sitemap.
sitemap: <insert absolute sitemap xml url>
Blocking JS and CSS files
When WordPress sites are deployed its very common that the developer forgets to unhook the “Discourage search engines from indexing this site”-option under Settings -> Reading Settings. This mistakes causes WordPress to disallow all search engines to crawl the site and add meta noindex to prevent indexing. If you are using WordPress and have issues with robots.txt, check the Readings Settings first.
We have seen many examples of Magento sites blocking JS and CSS. See above.
Adding UTF-8 BOM (Byte order mark) to robots.txt
The byte order mark (BOM) is a Unicode character which is added by some text editors. UTF-8 BOM can cause Google to ignore important parts of your robots.txt. To resolve this issue. Remove the BOM character from your robots.txt-file.
Tiaki checks for BOM and notify you if we detect it. To learn more about this issue read this great article by Glenn Gabe.
Validate robots.txt with a robots.txt tester
To validate your robots.txt use Googles robots.txt-tester tool.