The Crawl Budget o crawl budget is set as the maximum number of pages that Google crawls in a web portal.
Definition
Google itself establishes how many subpages it crawls per URL. This is not the same for all websites, but according to Matt Cutts, it is determined primarily based on the PageRank of a page. The higher the PageRank, the higher the Crawl Budget. The crawl budget also determines how often the most important pages of a website are crawled and how often an in-depth crawl is executed.
Index budget differentiation
The term index budget is different from a crawl budget. Determine how many URLs can be indexed. The difference becomes evident when a website contains multiple pages returning a 404 error code. Each page requested has the crawl budget, but if it cannot be indexed due to an error message, the index budget is not fully used.
Trouble
The crawl budget or crawl budget poses an obstacle for larger websites with many subpages. Specifically, not all subpages will be crawled, but only a part of them. Therefore, not all subpages can be indexed. This in turn means that site operators can lose traffic because the relevant pages were not indexed.
Importance for SEO
There is a whole section of search engine optimization dedicated specifically to this situation, with the aim of directing the Googlebot, so that the existing crawl budgets are used very wisely and the high-quality pages that are of particular importance to the operator of the search engine. web portal are indexed. Pages that are of less importance should be identified first. In particular, this would include pages with poor content or little information, as well as bad pages that return a 404 error code. These pages must be excluded from the crawl so that the crawl budget remains available for the best quality pages. Next, the important subpages should be designed in such a way that they are crawled by spiders as a priority. Possible actions as part of crawl optimization include:
- Implementation of a flat page architecture where subpage paths are as short as possible and only require a few clicks.
- Internal links from pages with a lot of backlinks to pages that are supposed to be crawled more often.
- Very good internal links from the most important pages.
- Exclusion of unimportant pages for crawling using the robots.txt file (such as login pages, contact forms, images).
- Exclude traceability or crawling through the use of metadata (noindex, nofollow).
- Provide an XML sitemap with a list of URLs for the most important subpages.
If the portfolio of crawled and indexed pages is improved through crawl optimization, the positioning can also be improved. Well-ranked pages are crawled more often, which in turn pays off.
An informative lecture on "Crawl Budget Best Practices" by Jan Hendrik Jacob Merlin at SEOkomm 2015 can be found here.
Web Links