Home / Technical site audit / Blocked for indexing but in sitemap.xml

Blocked for indexing but in sitemap.xml

A Sitemap.xml file is essentially a map of your website designed specifically for easy navigation and indexing of your site by search engines. It is located within your public_html folder (or site root) and includes important instructions for search engine crawlers that specify what pages should be visited, in what order, and how often to visit them.

This drastically accelerates the indexing process of important pages and allows the search crawlers to allocate their crawl time to pages of high importance to both you and your users.

Creating a sitemap.xml is not always needed but always recommended, especially for large sites with thousands of pages. With bigger sites, comes the need to really make sure search engine crawlers spend their time on those high value pages with deep content and commercial intent, not side pages that offer thin value.

As a rule of thumb, when software and CMS’s automatically generate a sitemap.xml file, they include all available pages for indexing. A typical website owner is not likely to be aware of this, and while they may have set noindex for certain pages, their automatically generated sitemaps are likely including these pages and wasting valuable crawl budgets!

It is highly recommended to use plugins, custom software, or sitemap generators to configure specific URLS to show in your sitemap, certain URL’s to be avoided, what order to crawl URL’s and how often to crawl them.

Sitemap errors found by Labrika

Attention! The sitemap error report will only be accessible if sufficient permissions to scan the whole website are configured correctly. Otherwise, Labrika will only be able to view pages specifically listed in the sitemap.xml rather than being able to view all pages on the website, and then cross-compare them with pages listed in the sitemap.

Labrika sitemap analysis helps find the following types of errors:

  • Pages that exist in the sitemap but are not accessible for indexing.

  • Pages that exist in the sitemap but have a noindex tag.

  • Pages that don’t exist in the sitemap but are indexable.

Please note: different search engines process sitemap rules in different ways. Google, most frequently, will only index pages than can be reached through automatic crawling without a sitemap. That is, pages that can be reached via internal links within the allotted crawl time and crawl depth for your site that day. They will not look at your sitemap.xml file to ascertain which links to crawl, but instead use the sitemap as a guide for how often to crawl pages listed in the sitemap.

Page doesn’t exist in sitemap, but is indexable

These are pages that are indexable but are not specifically listed in the sitemap for crawling and indexing.

This error is the most benign of the sitemap errors but should be addressed to maximize the crawl budget allocated to your site by the search engines. As crawlers only have a finite amount of pages and time they can spend on certain pages, then it essential for you to make sure they only visit sites that are of SEO value to both your users and your website.

To rectify this issue, we recommend you add relevant, indexable pages to the sitemap or make sure to noindex thin pages and/or pages with little SEO value and PageRank. If our tool has found useful and indexable pages that should be included in your sitemap, then it would pay for you to check your sitemap generation tool settings to make sure it is configured correctly.

How to fix the issue?

As a rule of thumb, any indexable page should be included in the sitemap or should be made non-indexable if it has thin content or has little SEO benefit.

Once you have identified sites in this section of the sitemap validator, you can:

  1. Add the relevant pages to your sitemap or check your automated sitemap generation tool to ensure it is working properly.
  2. If there are any pages there that have little value in terms of SEO, or have thin content, you can noindex them.

Download Labrika’s error free sitemap.xml file

For each of the different sitemap error reports listed above, Labrika offers you the ability to download an error free, and corrected version of your sitemap.xml file. This should save you time correcting your own sitemap.xml file manually, and most importantly, make better use of your search engine crawl budgets.