How to fix common robots.txt file mistakes and errors

What is the robots.txt file?

The robots.txt is a file created to tell crawler which pages and areas on your site they are allowed to visit. If you don’t wan’t to disallow any urls, you don’t even need a robots.txt. The robots.txt file is a regular txt-file and should be located at the site root. I.e. https://example.com/robots.txt.

Robots.txt examples

Disallow robots to crawl any page on your site

User-agent: *
Disallow: /

Allow robots to crawl any page on your site

User-agent: *
Disallow:

Common robots.txt mistakes

Disallow all

The most damaging line of code in SEO is “Disallow: /” which means search robots can’t access any pages on your site.

Tiaki will notifiy you if we detect this error.

Not using Sitemap

We always recommend adding sitemap to your robots.txt. This will tell search engine robots the location of your xml sitemap.

Code

sitemap: <insert absolute sitemap xml url>

More information

https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt#sitemap

Blocking JS and CSS files

If Googlebot and other bots are not allowed to read your JavaScript and CSS-files they won’t be able to render your site and give you full credit. So make sure you allow crawling of all JS and CSS files.

WordPress issues

wordpress robots txt settings

When WordPress sites are deployed its very common that the developer forgets to unhook the “Discourage search engines from indexing this site”-option under Settings -> Reading Settings. This mistakes causes WordPress to disallow all search engines to crawl the site and add meta noindex to prevent indexing. If you are using WordPress and have issues with robots.txt, check the Readings Settings first.

Magento issues

We have seen many examples of Magento sites blocking JS and CSS. See above.

Adding UTF-8 BOM (Byte order mark) to robots.txt

The byte order mark (BOM) is a Unicode character which is added by some text editors. UTF-8 BOM can cause Google to ignore important parts of your robots.txt. To resolve this issue. Remove the BOM character from your robots.txt-file.

Tiaki checks for BOM and notify you if we detect it. To learn more about this issue read this great article by Glenn Gabe.

Validate robots.txt with a robots.txt tester

robots.txt testing tool

To validate your robots.txt use Googles robots.txt-tester tool.

 Learn more

in SEO