Error - Page access was forbidden by the robots.txt file

Page access was forbidden by the robots.txt file

The robots.txt file is a text file located at the base of your site that is intended to provide instructions to crawlers. It allows you to define the rules for authorizing access to the different pages of a site, i.e. the pages they can explore or not. If the file is absent and or if its recovery returns a 4xx HTTP code, the robots consider that they can explore all the pages. On the other hand, if the robots encounter difficulties to recover it with connection problems such as a timeout or with a 5xx HTTP code, they consider that they cannot explore anything.

The guidelines follow a precise formalization :

user-agent: (name of the robot)
user-agent: (another robot name)
instruction: (page path)
instruction: (other page path)

The user-agent defines the name of the robot for which the instructions are applicable. The instructions can be "allow" for the authorization or "disallow" for the prohibition. If several similar instructions are defined, the last one is taken into account.

Example of general authorization for all robots except for the page "/administration" and for the robots "pierrebot" and "paulbot" :

user-agent: *
allow: /
disallow: /administration
user-agent: pierrebot
user-agent: paulbot
disallow: /

To unlock our robots and benefit from Cocolyze analysis, it is necessary to :

check that access to robots.txt does not return an error:

  • check that no prohibition directive for the user-agent "*" or "cocolyzebot" is present on the page to be analyzed
  • add an exception on your firewall/proxy for our bots (if site access security is active). Our robots are identified by their user-agents (they do not have fixed IPs):
    •  "Mozilla/5.0 (Linux ; Android 6.0.1 ; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible ; Cocolyzebot/1.0; + for mobile scanning
    • "Mozilla/5.0 (compatible; Cocolyzebot/1.0;" for computer analysis

To make it easier, you can add the following directives at the end of your robots.txt file so that our robots can scan all your pages :

user-agent: cocolyzebot
allow: /