Home / Technical site audit / Sitemap validator

Sitemap validator

Number of sitemap files found

This shows how many sitemap files were found on the site. Sitemap files contain a list of pages and other site resources to be indexed. This information helps search engines index the site more efficiently.

The standard adopted for the sitemap allows the use of many sitemap files. Please note that sitemap files may not be found if their paths do not comply with the sitemap protocol standard or have spelling errors.

Number of elements in the sitemap files

This report shows the number of HTML pages and other resources found in all the sitemap files at the time of site analysis.

Number of pages in all sitemap files

This report shows how many pages in HTML format were found in all the sitemap files at the time of the site analysis. This doesn't include resources in other formats such as images, etc.

This is a useful report for comparing the number found in the sitemap files vs the actual number of pages on the site and indexed in the search engines. This may help to detect various problems, quickly.

For example, some sites may not have removed pages from the sitemap that have been deleted or disabled in the site's administrative panel. This can therefore lead to a loss in crawling budget, and potentially a decrease in the site's position. If there are significantly fewer pages in the search engine index than in the sitemap it may also indicate that the site has issues with indexing. Or that the sitemap is formatted incorrectly and contains extra pages.

Errors found in the sitemap

Errors in the sitemap can lead to incorrect interpretation of data and the inability to use the entire file or individual lines within. We check the sitemap for compliance with sitemap, XML, w3c standards, as well as Google, Yahoo, Bing, and Yandex recommendations.

Warnings found in the sitemap

Warnings indicate that there are problems that will significantly decrease a sitemap's effectiveness.

For example, if a site has tens of thousands of pages, if done correctly, indexing changes on pages can take several hours to several days. However, if done incorrectly, for example, if there are no timestamps on the sitemap, then it may take several weeks to index the changes. This, therefore, slows down any promotion or optimization of your site.

Error messages

Invalid URL in the sitemap index file

This means that the URL of the sitemap file is incorrectly formatted or contains invalid characters. There are several common reasons why this error may occur:

Invalid URL

The URL in your sitemap is not written correctly. This error may be because it contains spaces, unsupported or invalid characters.

For example, htp:// or http:/ instead of http://

Make sure that the URLs listed in the sitemap are appropriately shielded.

For example, the "&" character in the URL should be replaced with "%26" and all spaces with "%20".

The URLs in the sitemap must comply with the RFC-3986 standard (https://www.ietf.org/rfc/rfc3986.txt)

The XML standard (https://www.w3.org/TR/REC-xml/)

And the RFC-3987 standard (https://www.ietf.org/rfc/rfc3987.txt).

For more details: https://developers.google.com/search/docs/advanced/sitemaps/build-sitemap?visit_id=637740703567261364-115722582&rd=2

Empty sitemap

The sitemap file does not contain any URLs

URLs not accessible

This error appears when we cannot retrieve the URLs in the sitemap.

Check the sitemap URL using the URL verification tool: (https://support.google.com/webmasters/answer/9012289) to find out if this address is available to Google.

Compression error

We got an error when trying to unpack the file. Use the gzip format to compress the file.

Too many redirects (>4)

The URL contains too many redirects for search robots. Replace the redirection addresses in your sitemap files with the URLs that need to be scanned. Avoid using JavaScript or meta-update type redirects.

No format declaration

This error occurs if the site map has an incorrect title or the title is missing for the format.

For example, if your sitemap is created in XML format, then it should have the title:

<?xml version="1.0" encoding="UTF-8"?>

Also, according to Google's rules and accepted standards, all XML attributes be enclosed in single (') or double (") quotes. The quotes should be straight, not curly.

Please note that word processing programs, such as Microsoft Word, can replace straight quotes with curly ones and this would then violate the requirements.

For more details visit: https://support.google.com/webmasters/answer/7451001?hl=en#zippy=%2Ccomplete-error-list

Sitemap file size error

The size of your site map in its uncompressed form exceeds 50 MB. If your sitemap is larger than the limit, it should be split into several smaller files.

For more details visit: https://developers.google.com/search/docs/advanced/sitemaps/large-sitemaps

https://support.google.com/webmasters/answer/7451001?hl=en#zippy=%2Ccomplete-error-list

Too many URLs in the sitemap

Your site map contains more than the maximum of 50,000 URLs. Divide the sitemap into several files and ensure that each contains no more than 50,000 URLs. You can also use the sitemap index file to place the URLs in your sitemaps.

For more details visit: https://developers.google.com/search/docs/advanced/sitemaps/large-sitemaps

https://support.google.com/webmasters/answer/7451001?hl=en#zippy=%2Ccomplete-error-list

Too many sitemaps in the sitemap index file

The sitemap index file contains more than 50,000 site maps.

Divide the sitemap index file into several files and make sure that no more than 50,000 sitemaps are specified in each of them.

For more details visit: https://developers.google.com/search/docs/advanced/sitemaps/large-sitemaps

https://support.google.com/webmasters/answer/7451001?hl=en#zippy=%2Ccomplete-error-list

https://www.sitemaps.org/protocol.html#index

Invalid date

Your sitemap contains an invalid date format.

Dates must use the W3C Datetime encoding (https://www.w3.org/TR/NOTE-datetime ).

Example of acceptable formats:

The time is optional (the default time is 00:00:00Z).

However, if you specify the time, you must specify the time zone.

For more details visit: https://www.sitemaps.org/protocol.html#xmlTagDefinitions

https://www.w3.org/TR/NOTE-datetime

https://support.google.com/webmasters/answer/7451001?hl=en#zippy=%2Ccomplete-error-list

Invalid XML: too many tags 

The sitemap contains duplicate tags.

Error example:

<url>

<loc>http://www.example.com/</loc>

<lastmod>2021-12-17</lastmod>

<lastmod>2021-12-19T16:00:17+04 00</lastmod>

<priority>0.8</priority>

</url>

In the example, the <lastmod> tag is specified twice for one element - this is an error.

The line number will be specified in the error message. To resolve, simply delete the duplicate tag.

Documentation: https://support.google.com/webmasters/answer/7451001 ?hl=en#zippy=%2Ccomplete-error-list

URL to another domain

The sitemap cannot contain a URL to another domain level or another subdomain.

For example, in the sitemap https://example.com/sitemap.xml you cannot refer to https://en.example.com/index.htm as this is a different domain.

For more details visit https://support.google.com/webmasters/answer/7451001?hl=en#zippy=%2Ccomplete-error-list

https://www.sitemaps.org/protocol.html#location

Invalid attribute value

The attribute contains an invalid value for the XML tag. Check your sitemaps and make sure that they contain only allowed attributes and that they are written according to sitemap specifications. Also, check the attributes and values for typos.

For more details visit: https://support.google.com/webmasters/answer/7451001?hl=en#zippy=%2Ccomplete-error-list

Unsupported format

Google supports the following file formats:

This error also occurs if the site map has an incorrect title, or the title is missing for the format being used.

For example, if your sitemap is in XML format, then it should have the title:

<?xml version="1.0" encoding="UTF-8"?>

As mentioned previously, according to Google's rules, all XML attributes should be enclosed in single (') or double (") quotes. The quotes should be straight, not curly.

Please note that word processing programs, such as Microsoft Word, can replace straight quotes with curly ones. This goes against the standard requirements.

Source: https://support.google.com/webmasters/answer/7451001 ?hl=en#zippy=%2Ccomplete-error-list

Path mismatch - Missing www

The path to the sitemap does not contain the www prefix (for example, https://example.com/sitemap.xml)

But the URLs listed in it contain www (for example, https://www.example.com/index.hml).

All site pages must be redirected to the desired version - with www or without www. Make sure that the sitemap contains the same prefix variant.

Source: https://support.google.com/webmasters/answer/7451001 ?hl=en#zippy=%2Ccomplete-error-list

Path mismatch: Includes www

The path to the sitemap contains the www prefix (for example, https://www.example.com/sitemap.xml), but the URLs listed in it do not contain www (for example, https://example.com/index.hml).

All site pages must be redirected to the desired version - with www or without www. Make sure that the sitemap contains the same prefix variant.

Source: https://support.google.com/webmasters/answer/7451001?hl=en#zippy=%2Ccomplete-error-list

Incorrect namespace

The sitemap does not contain the correct namespace or is declared incorrectly. The namespace may have a typo or an incorrect URL.

Make sure you are using the correct namespace for your file type. For example:

A file that stores HTML and images will contain

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"

xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">

The error may occur if the format is written incorrectly, for example: /.9 instead of /0.9.

Or if the path is specified incorrectly, for example:

<urlset xmlns="/schemas/sitemap/0.9">

instead of

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9 ">

or <urlset xmlns="http://www.sitemaps.org/schmas/sitemap/0.9 ">

- the letter in the link is missing here.

Sources: https://support.google.com/webmasters/answer/7451001 ?hl=en#zippy=%2Ccomplete-error-list

https://www.w3.org/XML/Schema#dev

Invalid tag value

Your sitemap contains a tag with an invalid value. Check the specifications for your type of sitemap.

For more details visit: https://support.google.com/webmasters/answer/7451001?hl=en#zippy=%2Ccomplete-error-list

https://www.sitemaps.org/protocol.html#xmlTagDefinitions

https://developers.google.com/search/docs/advanced/sitemaps/video-sitemaps

https://developers.google.com/search/docs/advanced/sitemaps/image-sitemaps

https://developers.google.com/search/docs/advanced/sitemaps/news-sitemap

Invalid URL in sitemap index file: incomplete URL

Google describes this error as follows: "The sitemap index file contains an incomplete URL".

When search engines see the sitemap index, they search in the same directory for the files it links to.

The location of the Sitemap file defines a set of URLs that can be included in that particular Sitemap file.

For example, a Sitemap file located at http://example.com/catalog/sitemap.xml can include any URLs starting with http://example.com/catalog/

But cannot include URLs starting with http://example.com/images/.

If our tool can't find the URL there, you'll see this error. Update the sitemap index file to include the full path to each sitemap file in the list, and then resubmit.

There have been many court cases against search engines for erroneous indexing of sections of sites that are closed off

Even though there is little information about this error, a whole section is devoted to it in Google's sitemap standards, despite many other errors not being included. Therefore, in our opinion, this rule exists to avoid erroneous indexing of data that is meant to be blocked from indexing.

Please note not all developers take note of these sitemap standards when developing their CMS plugins, and therefore this error can arise.

Sources: https://support.google.com/webmasters/answer/7451001?hl=en#zippy=%2Ccomplete-error-list

https://www.sitemaps.org/protocol.html#location

Missing XML required attribute

There is no attribute in the site map tag. This is required.

Error example:

<?xml version="1.0" encoding="UTF-8"?>

<urlset>

<url>

<loc>http://www.example.com/</loc>

<lastmod>2023-11-09</lastmod>

</url>

</urlset>

Here in the tag

<urlset>

This attribute is omitted:

xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"

The tag must include:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

Source: https://developers.google.com/search/docs/advanced/sitemaps/build-sitemap

Missing required XML tag

The required tag is missing. The line number will be specified in the error message.

Error example:

<url>

<lastmod>2021-12-19T16:00:17+04 00</lastmod>

<priority>0.8</priority>

</url>

The <loc> tag is missing in the example therefore it is unclear which URL the element belongs to.

For more details visit: https://developers.google.com/search/docs/advanced/sitemaps/build-sitemap

https://www.sitemaps.org/protocol.html

Missing thumbnail URL

The thumbnail image URL is missing.

Make sure that the location of the URLs of all thumbnails is specified using the <video: thumbnail_loc> tag.

Source: https://support.google.com/webmasters/answer/7451001?hl=en#zippy=%2Ccomplete-error-list

https://developers.google.com/search/docs/advanced/sitemaps/video-sitemaps

Missing video title

The title for the video is missing.

Make sure that each video has a title specified in the <video: title> tag in your sitemap.

Source: https://support.google.com/webmasters/answer/7451001?hl=en#zippy=%2Ccomplete-error-list

https://developers.google.com/search/docs/advanced/sitemaps/video-sitemaps

Incorrect sitemap index format: Nested sitemap indexes

One or more entries in your sitemap index file point to its own URL or the URL of another sitemap index file.

No other sitemap index files can be specified in the sitemap index file, only site map files.

Delete all entries pointing to sitemap index files.

Source: https://support.google.com/webmasters/answer/7451001?hl=en#zippy=%2Ccomplete-error-list

Parsing error

We could not parse the XML of the site map. Most likely, the file has violations in its XML format. You need to download the file and check it with one of the XML validators.

Often this problem is caused by an unshielded character in the URL tag nesting violations. As with all XML files, any data values (including URLs) must use entity escape codes for certain characters, such as characters & '"<>. `

Make sure your URLs are properly escaped. For example, the "&" character should be replaced in the URL with "%26", and all spaces with "%20".

The URLs in the sitemap must comply with:

Source: https://support.google.com/webmasters/answer/7451001?hl=en#zippy=%2Ccomplete-error-list

Thumbnail too large

The thumbnail image of the video specified in the sitemap is too large. Reduce the size of the video thumbnail to 160 x 120 pixels.

Source: https://support.google.com/webmasters/answer/7451001?hl=en#zippy=%2Ccomplete-error-list

For more details visit: https://developers.google.com/search/docs/advanced/sitemaps/video-sitemaps?visit_id=637740801105480409-2568149602&rd=1

Thumbnail too small

The thumbnail image of the video specified in the sitemap is too small. Increase the size of the video thumbnail to 160 x 120 pixels.

Source: https://support.google.com/webmasters/answer/7451001?hl=en#zippy=%2Ccomplete-error-list

For more details visit: https://developers.google.com/search/docs/advanced/sitemaps/video-sitemaps

Video location and play page location are the same

In the video sitemap, the URL of the video content and the URL of the player cannot match. If you specify both <video: player_loc> and <video: content_loc>, the URLs must be different.

For more details visit: https://support.google.com/webmasters/answer/7451001?hl=en#zippy=%2Ccomplete-error-list

https://developers.google.com/search/docs/advanced/sitemaps/video-sitemaps

Video location URL appears to be a play page URL

The video content URL <video: content_loc> points to the same page where the player is located.

For more details visit: https://support.google.com/webmasters/answer/7451001?hl=en#zippy=%2Ccomplete-error-list

https://developers.google.com/search/docs/advanced/sitemaps/video-sitemaps

Tag <changefreq> = never

The <changefreq> tag in the sitemap tells the search engine how often to rescan this page. Scanners can periodically crawl pages marked "never" to track unexpected changes on these pages. However, this value of 'never' is not recommended because if you suddenly change the page's content and enter a fresh date in the <lastmod> tag, it's unclear how the search engine will behave and whether the changes will get indexed and how long it could take.

Sitemap is blocked by robots.txt

Search engines will not be able to access your sitemap because the robots.txt file blocks it.

Change the robots.txt file to allow robots to scan the site map.

Relative link

In the standards on sitemaps.org the following requirements are recorded:

URL of the page. This URL must begin with the protocol (such as HTTP) and end with a trailing slash if your web server requires it. This value must be less than 2,048 characters.

Source: https://www.sitemaps.org/protocol.html#xmlTagDefinitions

URL too long

The length of the URL cannot exceed 2048 characters.

Source: https://www.sitemaps.org/protocol.html#xmlTagDefinitions

Types of warnings:

All pages inside the same sitemap have the same document modification time

For search engines to work effectively, the sitemap must contain the actual date of the page change. Without this, it dramatically loses its effectiveness.

To understand this problem, you need to know how search engines work. The search engine spends resources reading and analyzing each page, increasing the load on the server where the site is hosted. Therefore, the search engine calculates a so-called crawling budget for each site. In its simplified form, this means how many pages should be indexed on a particular site per day.

Example: A site has 10,000 pages and a crawl budget of 300 pages per day. But for a short time, the search engine can increase this budget, for example, up to 900 pages per day, if it notices that all the site pages have changed, or you have sent the sitemap for reindexing.

If there is no site map, then in normal mode, the budget will be allocated for indexing three categories of pages:

So, for a site with 10,000 pages (without a sitemap), and with a rate of 300 pages being indexed per day, it could take over 34 days to be indexed in its entirety. This is because part of the crawling budget will be spent on reindexing already completed pages. Therefore, wasting crawl budget and time.

The search engine only has two ways to find out if a page has been changed: reading the information about the page in the sitemap (without spending the crawling budget) or indexing the page (and spending the crawling budget on it). Therefore, if you do not have a tag with the correct page update date, it will simply reindex all the pages.

By having the same document modification time for all pages in a sitemap, the search engines then do not know what pages to prioritize. They may focus on other elements such as the priority tag. However, this is a much less accurate tool for speeding up indexing. For example, you can have 1000 pages with a priority of 0.8 and 9000 pages with a priority of 0.3. In this case, changes on pages from the first group can be indexed in over five days, and from the second group, it could be more than a month. If you use this priority tag alongside the document modification date correctly, indexing changes can occur within a day.

The page change time must be specified in the W3C Datetime format (http://www.w3.org/TR/NOTE-datetime).

The standard allows several formats, for example:

<lastmod>2021-12-23T18:00:15+00:00</ lastmod>

or

<lastmod>2021-11-23</lastmod>.

The site map does not contain a tag with the last document modification date

For search engines to work effectively, the site map must contain the actual date of the page change. Without this, it dramatically loses its effectiveness and becomes useless.

To understand this problem, you need to know how search engines work. The search engine spends resources reading and analyzing each page, increasing the load on the server where the site is hosted. Therefore, the search engine calculates a so-called crawling budget for each site. In its simplified form, this means how many pages should be indexed on a particular site per day.

Example: A site has 10,000 pages and a crawl budget of 300 pages per day. But for a short time, the search engine can increase this budget, for example, up to 900 pages per day, if it notices that all the site pages have changed, or you have sent the sitemap for reindexing.

If there is no site map, then in normal mode, the budget will be allocated for indexing three categories of pages:

So, for a site with 10,000 pages (without a sitemap), and with a rate of 300 pages being indexed per day, it could take over 34 days to be indexed in its entirety. This is because part of the crawling budget will be spent on reindexing already completed pages. Therefore, wasting crawl budget and time.

The search engine only has two ways to find out if a page has been changed: reading the information about the page in the sitemap (without spending the crawling budget) or indexing the page (and spending the crawling budget on it). Therefore, if you do not have a tag with the correct page update date, it will simply reindex all the pages.

The page change time must be specified in the W3C Datetime format (http://www.w3.org/TR/NOTE-datetime)

The standard allows several formats, for example:

<lastmod>2021-12-23T18:00:15+00:00</lastmod>

or

<lastmod>2021-11-23</lastmod>.

To note, not all sitemap file formats allow for the last modified date or priority tag. If your site does not support this sitemap format, then this may harm the indexing speed of your site. It may be worth looking at changing platform if this is the case.

There is no page refresh date or page refresh priority in the entire sitemap

The page refresh date and page indexing priority are the two most important attributes of the sitemap. The tag containing the last modification time allows search engines to determine which pages have been changed today and should be indexed as quickly as possible if there is enough crawling budget. This is the best attribute to speed up indexing.

The priority attribute allows you to index the more important pages first, even among these updated pages. However, if the site map does not specify either the priority or the last document modification time the site will run into big problems when it comes to indexing.

Search engines spend resources reading and analyzing each page, therefore increasing the load on the server where the site is hosted. For this reason, the search engine calculates a so-called crawling budget for each site. This is essentially, how many pages should be indexed on a particular site per day.

Example: A site has 10,000 pages and a crawl budget of 300 pages per day. But for a short time, the search engine can increase this budget, for example, up to 900 pages per day, if it notices that all the site pages have changed, or you have sent the sitemap for reindexing.

If there is no site map, then in normal mode, the budget will be allocated for indexing three categories of pages:

So, for a site with 10,000 pages (without a sitemap), and with a rate of 300 pages being indexed per day, it could take over 34 days to be indexed in its entirety. This is because part of the crawling budget will be spent on reindexing already completed pages. Therefore, wasting crawl budget and time. However, if you use the <lastmod> and <priority> tags correctly then you can see pages indexed within a day.

The page change time must be specified in the W3C Datetime format (http://www.w3.org/TR/NOTE-datetime ). The standard allows several formats, for example:

<lastmod>2021-12-23T18:00:15+00:00</lastmod>

or

<lastmod>2021-11-23</lastmod>.

To note, not all sitemap file formats allow for the last modified date or priority tag. If your site does not support this sitemap format, then this may harm the indexing speed of your site. It may be worth looking at changing platform if this is the case.

There is no refresh rate or indexing priority specified for the page.

To be indexed as quickly as possible, it's best to:

Without this data, a search engine on a large site may not index changes on the page for over a month. Pages with no priority information, but frequent updates will be indexed according to the basic indexing standards, greatly increasing the time to index potentially important pages.

Leading whitespace

Your sitemap starts with a space, not a namespace declaration. XML files must begin with an XML declaration that specifies the version of this format to be used.

This will not prevent search engines from processing your sitemap, but Google recommends removing spaces so that the file conforms to the XML standard.

Source: https://support.google.com/webmasters/answer/7451001 ?hl=en#zippy=%2Ccomplete-error-list

Sitemap URL redirects to another URL

For example, it specifies http://example.com/sitemap.xml, but it forwards to https://example.com/sitemap1.xml.

This will not prevent search engines from processing your sitemap but can lead to issues if the redirection doesn't work correctly.

Replace the redirection URLs in your sitemap files with those that need to be scanned.

Invalid URL priority format

The indexing priority can have a value from 0.0 to 1.0.

A dot separates the digits in the value, and values less than one should start with the character "0". E.g., you should write: "0.3", not ".3".

URL not allowed

Your sitemap includes URLs on a different subdomain or a domain other than the sitemap domain.

For example, if your site map is located at http://www.example.com/sitemap.xml, then the following URLs will be invalid for this sitemap:

http://example.com / - "www" is missing.

www.example.com / - missing "http"

https://www.example.com / - using "https", not "http".

There is a special section dedicated to this error in the sitemap’s standard:

https://www.sitemaps.org/protocol.html#location

"URLs that are not considered valid are excluded from further consideration."

Therefore, even in the description of the standard, it is strongly recommended to place the sitemap file in the site's root directory.

For more details visit: https://support.google.com/webmasters/answer/7451001?hl=en#zippy=%2Ccomplete-error-list

All pages inside the same sitemap have the same priority

Pages that change more often and are of more interest to users should be prioritized.

To indicate page priority, you can use the following tag (ranging from 0 to 1):

<priority>0.8</priority>.

Please note that not all sitemap file formats support this tag.

First and foremost, the search engine will focus on the document update time tag since this is a more accurate value. When you specify the exact date of page changes in your sitemap, the priority attribute holds less weight. Therefore, if the tag contains up-to-date information, having the same page priority is not a big problem.

However, assigning a high priority to all URLs on the site also does not make sense because this simply makes all page’s equal priority again.

For example, if a search engine decides to allocate a crawling budget of 30 pages per day to a site with 1000 pages, then increasing the priority for all the pages will not increase the speed at which they are indexed. 30 pages would continue to be indexed per day.

The best practice is to prioritize those pages that change most often and are more important for gaining traffic from the search engine. The rest of the pages should then be given a lower priority. This will then balance out the indexing, ensuring higher priority pages are indexed faster.

The tag is particularly significant when a whole site has been updated, as the last date of modification is then equal across all pages. The tag then becomes the fallback. Ensuring a good setup with this tag, then ensures the most important pages will be reindexed first.