close
August 5, 2021

What are orphan pages and how to fix them

Orphan pages are bad for SEO and for your website's rankings. Unfortunately, they are easy to create unintentionally and can be difficult to identify. Fixing orphan pages can improve your site's SEO and avoid Google penalties.

What are orphan pages?

Orphan pages are web pages that exist on your website but have no internal links to them from any other pages on your site. However, there may be links to them from external sources.

Occasionally orphan pages may be created intentionally. But, in the vast majority of cases, they are unintended mistakes, that webmasters may be unaware of. They are bad for SEO, and too many may cause Google to lower your site's ranking. 

Why are orphan pages bad for SEO?

Here are some of the more important negative impacts of orphan pages:

  • It was used previously as a Black-Hat SEO technique. This included hiding some pages from users but ensuring search engines would find them. Therefore, search engines may presume the webmaster is attempting to trick them.
  • Search engines view pages with no internal links as unimportant.
  • Google penalizes the entire site when it finds orphan pages.
  • Orphan pages waste crawl budget and the craw rate slows with few internal links.
  • Search engines cannot understand how orphan pages fit within the overall site structure. This means they will struggle to calculate their relevance and pass no authority to them.
  • Orphan page content can disrupt contextual keyword targeting and impact SERP rankings.

10 common causes of orphan pages

  1. A website migration that wasn't successfully managed.
  2. Pages that were created early in the site build process, submitted to Google, but then abandoned. Meaning they were excluded from the site's navigation architecture.
  3. Pages that were added to the XML sitemap when they were created, but no longer form part of the site's flow.
  4. Pages used in A/B testing that were never deleted.
  5. Landing pages that are no longer used. Typically landing pages have no internal links leading to them.
  6. A blogger wanted to remove pages from public view, but not delete them, such as with an old blog category. The old blog category is deleted but now all the pages are orphaned.
  7. Pages that have been forgotten over time. The site was restricted, or its navigation changed, leaving those pages behind.
  8. Product pages that still exist for items that are out of stock or discontinued. Or expired classified ads.
  9. Old videos, articles, or content that are no longer relevant. So, they have internal links removed.
  10. Bad use of CMS to create pages, meaning orphan pages are left undetected.

How to fix orphan pages

  • Step 1: Identify orphan pages through URL mapping

    Obviously, you won't find orphan pages by crawling your website. You must look at search engines, especially Google and Bing, to extract all links from the website.

    In Google Analytics you can extract a list of all URLs that have been indexed and sort them by “least visited”. Do this by navigating to Behavior > Site Content > All Pages. In Bing, the corresponding tool is Indexed Pages Checker. Then export the URLs into a spreadsheet.

    Then you need to crawl your website to build a corresponding list of “official” valid URLs. You can easily find suitable tools by searching for “website crawler tool.

    By comparing both lists, you highlight orphan pages.

    Note: the process may be a little more detailed than this summary describes. But this is the basic essence of how to find a list of orphan pages.

    You can also use Labrika’s own sitemap validator tool, this gives you access to any pages that may be on your site, but aren’t indexable. Making it a quick and easy way to access a list of orphan pages quickly!

  • Step 2: Assess the orphan pages and decide on an action for each one

    Start by asking yourself the following questions. This will then affect the action you take.

    • Q1. How important is the page? If it has importance, then integrate it back into the site, otherwise delete it.
    • Q2. Does the page rank for your keywords? If so integrate it back into the site, otherwise, delete it.
    • Q3. Is the page a duplicate or almost a duplicate? Perhaps it can be merged with another non-orphan page.
    • Q4. Are there backlinks to the page from other websites?

    For pages that you re-integrate back into the website, take the opportunity to assess the page’s quality:

    • Does it need to be optimized?
    • Where should it be linked from internally?

How to manage expired pages and old listings to avoid creating orphan pages

Think of eBay for a moment. Every day, millions of auctions end, and their listings expire. eBay does not delete those expired listings. Many will have been picked up by search engines and will appear on SERPS for years to come in some cases. The last thing eBay wants is for a prospective customer to be directed to a “404 Page not found” error on the eBay site.

Instead, eBay treats expired listings as valuable lead generators. Visitors who click on an expired listing in the SERP will be shown alternative product suggestions. As well as the original expired listing.

This strategy applies just as well to e-commerce sites where products are permanently out of stock or discontinued. Those product pages are still indexed in search engines and can be treated as potential landing pages.

However, you may not wish to retain expired pages on your website for valid reasons. In that case, it is best to ensure they return a 404 or 410 (expired content) code that you can control.  To do this you can use a custom 404 page.

In summary - website management best practices prevent orphan pages

Any SEO professional or website builder is well aware of the dangers to SEO if orphan pages are found. Normally, they build checks and detection mechanisms into their processes to stop this.

A thorough site audit using the above steps should uncover any orphan pages. If you have a larger site you may want to bring in professional SEO services to stop you wasting time and money.

Don’t forget that Labrika offers a sitemap validator tool which can give you a list of Orphan pages quickly and easily. 

FREE TRIAL

Start your free trial now

Capabilities

close

Full SEO Audit

  • Probable Affiliates Check;
  • Text Quality Optimization Test;
  • Check for Plagiarism (unoriginal texts);
  • Snippets in Google;
  • Number of internal links to landing pages;
  • Number of duplicate links on the page;
  • Links to other sites;
  • Over-spamming in the text;
  • Over-spamming in META tags and H1 (+3 factors);
  • Excessive use of bold type;
  • Multiple use of the same word in the sentence;
  • Multiple use of bigrams in the text and META tags;
  • Multiple use of trigrams in the text and META tags;
  • Excessive use of headers;
  • Skinny pages (with small text);
  • Pages without outgoing internal links;
  • Check landing page relevance;
  • Pages closed from indexing;
  • TITLE = DESCRIPTION;
  • TITLE = H1;
  • DESCRIPTION = H1;
  • H1 = H2, H3, H4;
  • TITLE duplicates;
  • DESCRIPTION duplicates;
  • Not filled TITLE, DESCRIPTION (+2);
  • Number of indexed pages in Google (+2);
  • Pages closed from indexing in Robots, noindex, nofollow, rel = canonical (+4);
  • Landing pages in the sitemap.xml;
  • Non-indexed landing pages;
  • Landing pages URLs history;
  • Adult content;
  • Swear words and profanity.
close

Tools

  • Export your reports to XLS;
  • Import your key phrases, cluster analysis and landing pages url’s from CSV format;
  • Printed version of the site audit in DOCX;
  • Guest access to audit;
  • Generate sitemap.xml with duplicate pages and pages closed from indexing;
  • Labrika highlights texts that are used for snippets.
close

Technical audit

  • Errors 403, 404;
  • Errors 500, 503, 504;
  • Not Responding pages;
  • Critical HTML errors
  • W3C HTML Validator;
  • Multiple redirects;
  • Lost images;
  • Lost JS;
  • Lost CSS;
  • Lost files;
  • Multiple TITLE tags;
  • Multiple DESCRIPTION tags;
  • Multiple KEYWORDS tags;
  • Multiple H1 tags;
  • Pages with rel = "canonical";
  • Common Duplicate Content Issues: www. vs non-www. and http vs https versions of URLs;
  • Correct 404 status code header;
  • Duplicate pages;
  • Mobile HTML optimization;
  • HTML size optimization;
  • Page speed time;
  • Large pages;
  • 3 types of Sitemap.xml errors (+3);
  • 26 types of Robots.txt errors (+26);
  • Tag Length: TITLE, DESCRIPTION, H1 (+3);
  • SSL Certificate Checker (+7);
  • Check if the Domain or IP is Blacklisted;
  • Pages with program's error messages;
  • Check a website response from User-agent;
  • Test the availability of your website from locations worldwide;
  • Test the website for Cloaking;
  • Test if some search engine is blocked by the website;
  • Check a website response from mobile.
close

Recommendations for text optimization

  • Keyword clustering;
  • Check landing page relevance;
  • Find correct landing page;
  • Find the optimal level of the page;
  • Recommendations for text optimization;
  • Optimal text length;
  • Keyword in the main text (+2);
  • Keyword in TITLE (+2);
  • Keyword in DESCRIPTION (+2);
  • Keyword in H1 (+2);
  • Latent semantics (LSI) on the page;
  • Number of relevant pages on the site;
  • TF-IDF calculation for text in BODY, TITLE, DESCRIPTION, H1 (+4);
  • Estimate the level of the page optimization.
close

Keyword characteristics

  • Number of main pages in TOP10;
  • A list of relevant landing pages;
  • Recommended keyword depth;
  • Latent semantics (LSI).
close

User metrics

  • Google Analytics;
  • Drawing charts;
  • % of Bounce Rates;
  • View depth of the site;
  • Average session time;
  • Number of visitors;
  • Mobile devices: traffic, bounce rates, visit time (+4);
  • Visits from sources with traffic and bounce rates (+2);
  • Information on the pages with traffic and level of bounce rates (+2);
  • Visits from cities with traffic and bounce rates (+2);
  • Visits from search engines with traffic and bounce rates (+2);
  • Key phrases with bounce rates and traffic from search results (+2);
  • List of all search requests that people used to find your site for one-year period;
  • Pages without traffic.
close

Analysis of competitors' websites

  • List of competitors;
  • Snippets of competitors;
  • Labrika generates recommendations for texts based on the analysis of competitors' websites and Machine Learning algorithm;
close

Check your search rankings

  • Site positions in Google around the world (+1);
  • All Locations & Languages;
  • Country, city or region levels;
  • More than 100k locations;
  • High-precision analysis;
  • Check search positions in several regions at the same time;
  • Monitor the dynamics of search rankings (+1);
  • Available position archive;
  • Download position lists in XLS format for selected period;
  • Desktop and mobile rankings;
  • Top 50, 100 or 200 results.
close

Domain information

  • Domain Age;
  • Domain payment plan expiration date;
  • Website hosting;
  • Hosting region;
  • IP of the site;
  • The number of sites with the same IP;
  • NS Records;
  • Favicon.

Address

3rd Floor
86-90 Paul Street
London EC2A 4NE