Indexability is about allowing or banning the indexation of a page in the search engines, i.e. the possibility for Google to easily explore a page and add it to their results. Various elements can explain why a page isn’t displayed in the results:
Exploration with Robots.txt
The Robot.txt file is a text file situated in the root of the website. It’s imperatively read by each search engine robot before exploring a website. This file contains instructions that authorizes or bans the site to be explored from a folder or a file by certain robots.
It’s often used to voluntarily ban the exploration from certain robots.
If the file doesn’t contain any instructions, the robot will explore the website.
Be careful though, you musn’t use this blocking instruction in the Robots.txt if you want to stop a page from being indexed in the search engines. It’s actually possible that Google indexes the content anyway if it finds links towards this page.
The instruction banning the indexation is the NoIndex instruction found on the page. This way, if the page exploration is banned by the robots.txt, Google will never be able to know that this contained an instruction banning them from indexing it.
The NoIndex instruction bans a page from being indexed in the search engines. It’s important that the engine can explore the page (i.e. the Robots.txt file doesn’t stop it).
There are different ways to ban indexation with the NoIndex:
If the exploration robots come up against simultaneous instructions, they’ll take the most restrictive instruction that they find. No instruction and the robot can index the page.
Here are the page‘s instructions:
Headers X-Robots-Tag :
Meta Robots :
Meta GoogleBot :
The page‘s canonical URL is:
A canonical URL defines the favorite URL of the page. It prevents the risk of duplicate content.
For example, some search filters add settings to the URLs:
https://website.com/boots.html can become https://website.com/boots.html?color=red during a search.
These are 2 different URLs for the search engines which can be interpreted as duplicate URLs, and so are penalized by the search engines.
A canonical URL is defined in the HTML tag rel=”canonical” as follow:
<link rel=”canonical” href=”http://website.com/boots.html”/>
We recommend you always define one to avoid the risk of content duplication.
Content redirections are redirections done after a page has loaded. They should be avoided whenever possible because they can be badly interpreted by search engines and are seen as cloaking techniques (a technique aimed at rigging the search engine results by offering different content for Google and users.
We recommend you use 301 or 302 redirects rather than content redirections.