Directives report - hextrakt user guide

Directives are used to guide search engine bots, by giving them important information, in particular which pages to crawl, which to index, alternative versions for other languages or mobile pages... Errors in these rules may seriously harm your SEO.

These directives come from multiple places :

robots.txt file
HTTP header
HTML head (meta, link rel)
HTTP status codes

We gather them in these reports to help you to find issues and optimize.

Indexability

Indexable (by search engine bots) pages

Self-canonicalized: the pages contain a link rel="canonical" to itself AND are not unindexable
Not canonicalized: the pages do not contain link rel="canonical" AND are not unindexable

Unindexable (by search engine bots) pages

Blocked by robots.txt: URLs match disallow rules for user agent: * OR user-agent: googlebot
Noindex pages: X-Robots-Tag contains noindex OR meta robots contains noindex
Non 200 status code: status code of the page is >= 300 OR it has a TCP error. Please note that 302 and 307 status code page may be indexed in Google at first, then treated as 301 (target URL may be indexed) after some time.
Other-canonicalized pages : pages contain a link rel="canonical" to another URL (not to be considered as a strict indicator for unindexability, it depends on other factors, and this may vary depending on search engines)

Not crawled (by Hextrakt)

Not crawled internal pages: blocked by robots.txt (if "ignore robots.txt file" remains unchecked in configuration) or linked only with nofollow links (if "Ignore nofollow" is unchecked)
External pages (in the advanced tab of the project configuration, if you leave "Check external links" checked, Hextrakt will get the status code for external pages): pages that are not in the project perimeter
All uncrawled URLs: total uncrawled URLs (including URLs which are not crawled due to crawl limit or crawl stopped by user).

Conflicts

X-Robots VS meta robots: conflicting directives between the HTTP header and the meta tag.
Noindex & blocked by robots.txt: do not block noindex pages with robots.txt: in this case search engine can't know that a URL should not be indexed. To get this data you need to check "Ignore robots.txt file" in crawl configuration.
Noindex & other-canonicalized: do not use both noindex and link rel="canonical" to another URL for the same page

Canonical errors

Pages with blocked canonical: link rel="canonical" target page is blocked in robots.txt
Pages with noIndex canonical: link rel="canonical" target has a noindex directive
Pages with non 200 status canonical: link rel="canonical" target is redirected or in error
Canonical chain: link rel="canonical" target has a canonical to another URL
Unlinked canonical: link rel="canonical" target has no inlinks

Hreflang & Mobile

Hreflang

Pages without self hreflang link: no hreflang attribute pointing to themselves
Pages with canonical & hreflang conflict: canonical target is in another language
All pages w/o hreflang: no hreflang attribute
List of redirected hreflang: redirected hreflang target pages
List of hreflang errors (4xx & 5xx): hreflang target pages with 4xx & 5xx status

Mobile

All mobile alternate pages: list of mobile alternates pages
Mobile pages with canonical: to get this data, you have to include the mobile URLs in the crawl perimeter
Mobile pages w/o canonical: to get this data, you have to include the mobile URLs in the crawl perimeter

Amp pages

Amp pages: pages with HTML amp attribute
Not amp pages: pages without HTML amp attribute
With amphtml link: pages with an amp version
W/o amphtml link: pages without amp version

Redirects

Temporary redirects (status 302, 303, 307)
Permanent redirects (status 301, 308)
Special redirects (status 300, 304)
Client-side redirects (by javascript or meta refresh)

Javascript redirects detection is available when crawling in javascript rendering mode.