Crawl budget refers to the number of pages Googlebot (Google's web crawler) is willing to crawl on your site within a given timeframe. It is determined by two factors: crawl rate limit (how fast Googlebot can crawl without overloading your server) and crawl demand (how much Google actually wants to crawl your pages based on popularity and freshness).

Google's definition: Google defines crawl budget as "the number of URLs Googlebot can and wants to crawl." It is the intersection of crawl capacity and crawl demand.

Who Should Care About Crawl Budget?

For small websites with a few hundred pages, crawl budget is rarely a concern. Googlebot will typically crawl all your pages without any issues.

However, if your site has tens of thousands of URLs — as is common with e-commerce stores, news sites, or large content databases — crawl budget becomes critically important. Wasted crawl budget on low-value pages means your important pages may not get crawled and indexed promptly.

What Wastes Crawl Budget?

Several common issues can drain your crawl budget on pages that don't deserve it:

  • Duplicate content — Multiple URLs serving the same or very similar content (e.g., pagination, faceted navigation).
  • URL parameters — Sorting, filtering, or tracking parameters that create unique URLs for identical content.
  • Soft 404 pages — Pages that return a 200 status but display "no results" or empty content.
  • Redirect chains — Long chains of redirects that slow Googlebot down.
  • Low-quality or thin pages — Pages with very little unique, valuable content.
  • Blocked resources in robots.txt — If JavaScript or CSS is blocked, Googlebot can't render pages properly.

How to Optimise Your Crawl Budget

1. Use robots.txt Wisely

Block URLs you don't want indexed — admin pages, internal search results, duplicate filtered pages — using robots.txt. This prevents Googlebot from wasting crawl budget on them.

2. Fix Redirect Chains

Redirect directly from the old URL to the final destination. Each hop in a redirect chain costs crawl budget and dilutes link equity.

3. Canonical Tags

Use canonical tags on duplicate or near-duplicate pages to tell Google which version to index. This consolidates crawl signals to your preferred URL.

4. Improve Site Speed

Faster servers allow Googlebot to crawl more pages per session without overloading your infrastructure. Invest in reliable hosting and caching.

5. Remove or Noindex Low-Value Pages

Thin content, tag archive pages, and old expired URLs should either be removed (returning a 410 Gone) or marked with a noindex meta tag so Googlebot stops spending resources on them.

Pro tip: Use Google Search Console's "Pages" report to see which URLs are indexed and which are not. Large discrepancies between your sitemap URL count and indexed URLs often point to crawl budget issues.

Crawl Budget vs. Indexing Budget

These are related but different concepts. Crawl budget determines how many pages Googlebot visits. Indexing budget determines how many of those pages Google actually adds to its index. A page can be crawled but not indexed if Google deems it low quality, duplicate, or not useful.

Key Takeaways

  • Crawl budget = how many pages Googlebot will crawl on your site in a period.
  • Small sites rarely need to worry; large sites (10k+ URLs) should actively manage it.
  • Wasting budget on low-value URLs can delay indexing of important pages.
  • Fix duplicates, redirect chains, and use robots.txt and canonical tags to optimise.