An XML sitemap is a structured file in Extensible Markup Language (XML) format that lists the URLs of a website's pages to help search engines discover, crawl, and index content more efficiently. Following the Sitemap Protocol — developed by Google, Yahoo, and Microsoft — XML sitemaps tell search engine crawlers about the pages on a site, when they were last updated, how frequently they change, and their relative priority compared to other pages. XML sitemaps are the machine-readable counterpart to HTML sitemaps (which are designed for human visitors).
What an XML Sitemap Contains
An XML sitemap file contains a list of URL entries, each wrapped in <url> tags. The required field is <loc> (the full URL of the page). Optional fields include: <lastmod> (when the page was last modified, in YYYY-MM-DD format), <changefreq> (how often the page changes — always, hourly, daily, weekly, monthly, yearly, never), and <priority> (relative importance from 0.0 to 1.0, with 0.5 as default). A single XML sitemap can contain up to 50,000 URLs and must be under 50MB uncompressed — larger sites use sitemap index files that point to multiple individual sitemaps. XML sitemaps are submitted to Google via Google Search Console and referenced in robots.txt.
Why It Matters for SEO
XML sitemaps are a core technical SEO tool with clear practical benefits:
- Helps search engines discover new or updated pages faster, particularly for large sites
- Essential for deep pages not well-connected by internal links
- Required for new sites with few external links to ensure comprehensive indexing
- Google Search Console shows which sitemap URLs were crawled and indexed vs. discovered vs. excluded
- Specialized sitemaps (image, video, news) help rich media content appear in relevant Google search features