Directives are used to guide search engine bots, by giving them important information, in particular which pages to crawl, which to index, alternative versions for other languages or mobile pages... Errors in these rules may seriously harm your SEO.
These directives come from multiple places :
- robots.txt file
- HTTP header
- HTML head (meta, link rel)
- HTTP status codes
We gather them in these reports to help you to find issues and optimize.
Indexable (by search engine bots) pages
- Self-canonicalized: the pages contain a link rel="canonical" to itself AND are not unindexable
- Not canonicalized: the pages do not contain link rel="canonical" AND are not unindexable
Unindexable (by search engine bots) pages
- Blocked by robots.txt: URLs match disallow rules for user agent: * OR user-agent: googlebot
- Noindex pages: X-Robots-Tag contains noindex OR meta robots contains noindex
- Non 200 status code: status code of the page is >= 300 OR it has a TCP error. Please note that 302 and 307 status code page may be indexed in Google at first, then treated as 301 (target URL may be indexed) after some time.
- Other-canonicalized pages : pages contain a link rel="canonical" to another URL (not to be considered as a strict indicator for unindexability, it depends on other factors, and this may vary depending on search engines)
Not crawled (by Hextrakt)
- Not crawled internal pages: blocked by robots.txt (if "ignore robots.txt file" remains unchecked in configuration) or linked only with nofollow links (if "Ignore nofollow" is unchecked)
- External pages (in the advanced tab of the project configuration, if you leave "Check external links" checked, Hextrakt will get the status code for external pages): pages that are not in the project perimeter
- All uncrawled URLs: total uncrawled URLs (including URLs which are not crawled due to crawl limit or crawl stopped by user).
- X-Robots VS meta robots: conflicting directives between the HTTP header and the meta tag.
- Noindex & blocked by robots.txt: do not block noindex pages with robots.txt: in this case search engine can't know that a URL should not be indexed. To get this data you need to check "Ignore robots.txt file" in crawl configuration.
- Noindex & other-canonicalized: do not use both noindex and link rel="canonical" to another URL for the same page
- Pages with blocked canonical: link rel="canonical" target page is blocked in robots.txt
- Pages with noIndex canonical: link rel="canonical" target has a noindex directive
- Pages with non 200 status canonical: link rel="canonical" target is redirected or in error
- Canonical chain: link rel="canonical" target has a canonical to another URL
- Unlinked canonical: link rel="canonical" target has no inlinks
Hreflang & Mobile
- Pages without self hreflang link: no hreflang attribute pointing to themselves
- Pages with canonical & hreflang conflict: canonical target is in another language
- All pages w/o hreflang: no hreflang attribute
- List of redirected hreflang: redirected hreflang target pages
- List of hreflang errors (4xx & 5xx): hreflang target pages with 4xx & 5xx status
- All mobile alternate pages: list of mobile alternates pages
- Mobile pages with canonical: to get this data, you have to include the mobile URLs in the crawl perimeter
- Mobile pages w/o canonical: to get this data, you have to include the mobile URLs in the crawl perimeter
- Amp pages: pages with HTML amp attribute
- Not amp pages: pages without HTML amp attribute
- With amphtml link: pages with an amp version
- W/o amphtml link: pages without amp version
- Temporary redirects (status 302, 303, 307)
- Permanent redirects (status 301, 308)
- Special redirects (status 300, 304)