Every site is given an “allowance” of the number of pages that a search engine spider will crawl on that site. This is known as the crawl budget.
Googlebot sets a limit on the number of pages that will be crawled on a siteso that it doesn't degrade the experience of users visiting the site. The "crawl rate limit" is the the maximum fetching rate for a given site. This determines the number of simultaneous connections that Google bot will have to a site and the time delay between fetches. This Crawl rate is determined by:
- The performance of the site (speed and number of errors) and
- The limit set in Google Search Console
Not only is the crawl rate important, but the crawl demands also determins the number of pages that will be indexed. The demand is based on visits to the siter and even if the crawl rate limit isn't reached, there will be low activity from Googlebot, if there's no demand from indexing. There are 2 main factors that influence the crawl demand these are:
- Popularity of URL's - more popular URL's are crawled more often to ensure that results on Google are up to date and
- Changes and submissions - changes to URL's and submissions to Google increase the number of pages crawled
Crawl budget therefore is determined by a combination of Crawl Rate and Crawl Demand.
Factors that lower Crawl Demand
Google Bot is programmed not to waste resources (time) on low value-add URL's. Factors therefore that are likely to lower the Crawl Budget are:
- Poor Quality Content or Spam
- Duplicate content (unflagged)
- Poor Performance (Slow) Websites
- Soft Error Pages
- Hacked Pages