Sitemaps
What Is a Sitemap?
A sitemap is essentially a comprehensive list of all the URLs present on a website. It serves as a navigational tool for search engines, providing them with a clear roadmap to access and index the available content.
Sitemaps are especially valuable for websites that have a large amount of content or an intricate structure, as well as for sites that frequently update or expand their content.
Sitemap Index
A sitemap can include a maximum of 50,000 URLs. When this limit is exceeded, it becomes necessary to divide the URLs into multiple sitemaps. These individual sitemaps are then consolidated into a single, overarching sitemap index, essentially a "sitemap of sitemaps." This approach proves particularly beneficial for large websites with numerous sections and categories, allowing for the organisation of smaller, more manageable sitemaps.
What Should Be Included in a Sitemap?
In a sitemap, it's crucial to include only SEO-relevant pages. These are the pages you want search engines to crawl, which may not encompass every single page on your website. For larger websites, including only relevant pages can also optimise the allocation of crawl budget.
By limiting the sitemap to SEO-focused pages, you enable search engines to crawl more efficiently and intelligently, ensuring better indexation of your key content.
Pages that you should avoid including in a sitemap typically comprise non-canonical pages, duplicated content, paginated pages, parameter-based pages, and site search result pages.
Key Elements of a Sitemap
A sitemap comprises several elements, some of which are mandatory, while others are optional.
1. loc Tag: The first mandatory tag is the "loc" tag, which should contain the canonical version of a URL. It should accurately reflect the site's protocol, such as https, and whether it includes "www" or not.
2. lastmod Tag: While optional, the "lastmod" tag is recommended as it informs search engines about the last modification date of a page. Most search engines rely on this information to determine if a page has changed and needs to be re-crawled.
However, it's important to use this tag truthfully, only when actual changes have been made, and not as a means to deceive search engines.
3. changefreq Tag: Although not widely utilised by search engines anymore, the "changefreq" tag was previously used to provide a hint to search engines about how frequently a page might change.
4. priority Tag: Another optional tag, the "priority" tag is intended to indicate the relative importance of a page compared to other URLs on the site.
It operates on a scale from 0.0 to 1.0, with higher numbers indicating higher importance. However, it's worth noting that search engines often disregard this tag.