Skip to main content




The blocked content they are search engine pages that are blocked for various reasons. These could be pages that cannot be indexed by search engines, such as beta pages or pages with duplicate content.

There are several search engine blocking methods:

  • Robots.txt,
  • IP blocking,
  • Meta robots.

Robots.txt

Robots.txt (also: robot exclusion protocol) is a text file for robots, which is stored in the root directory. When indexing a page, the robot checks whether a robots.txt file exists and what instructions it contains. Specific pages or entire directories can be excluded with the robots.txt file. They will be ignored by search engine bots and will not be crawled or indexed. However, there are times when pages are indexed despite other instructions in the robots.txt file. This mainly happens when the pages are accessible from other pages, that is, when they are linked to other pages.

600x400-ContentBlocked-en-01.png

IP blocking

IP blocking can also prevent pages from being indexed by the search engine. Some user agents (eg search engine bots, spam bots) are excluded by an .htaccess file. But this method only useful if you know the name of the bot trying to enter and your IP. Since search engine robots temporarily disguise themselves as other robots, exclusion from the index is not necessarily guaranteed.

Google Analytics can be anonymized so that it cannot store the IP address.

Meta robots

The third and probably the best method to exclude web content from being indexed by search engines is the use of meta-robots. Meta robots is an HTML meta tag that provides search engine robots with specific instructions on whether the site should be indexed by search engines or if the links on the page should be followed. This meta tag is declared in the header of a page. If you are looking to exclude content from the page, the instructions on the robot tag would be:

 

Recommendation

When blocking pages, it is fundamentally important to exclude the correct content. You must make sure that important pages are well linked internally and are not accidentally blocked. If valuable pages are blocked, they cannot be indexed and passed on any valuable juice link.

Web Links

R Marketing Digital