When we talk about duplicate content we are referring to situations where you have a unique piece of content on your site, but with multiple URLs. All of these URLs then lead to that same piece of content.
This can occur for a multitude of reasons which we will be going through. We will also discuss the best ways to fix this issue.
To note: this is different to the issue where other sites duplicate your content on their own sites. We would refer to this as external duplicate content. This is harder to control; however, we can help with internal duplicate content.
Google prioritizes providing a great user experience for their users. When it encounters content that is significantly similar, it must decide which source or URL gets the highest ranking.
If it thinks a website is attempting to manipulate rankings to get more traffic, the site or URL may be downgraded. In extreme circumstances it may be removed from Google's index altogether. For this reason, it is an important issue to take care of.
There are a variety of online tools that can check for duplicate content.
Here at Labrika, we offer a non-original content checker, this will check and show any URLs on the internet that show similar (or the same) content. This will work even if it is within your own site. Making it a quick and easy way to find duplicate content within your own website.
For external duplicate content a site such as Copyscape is excellent. Alternatively, Siteliner (another tool created by Copyscape) is useful for finding internal duplicate content. They offer a limited free service, or a paid-for premium service.
Note: services like these may note a higher level of duplicate content than Google, as they tend to include all elements on the page, such as the sidebars. As Google would not include this in their analysis these tools may give an inflated duplicate content count.
If you already have a Labrika account you can use our non-original content checker, or if not you can sign up here.
Another method, if you have more time, is to use Google itself. There are many Google search operators but you should start with the
For example, say you have an article or page called: "How to fly a kite really high".
To find all the URLs that point to this, enter into Google search:
site: mysite.com intitle:"How to fly a kite really high"
Google will then search all instances of this page name within your site. Ideally it should only return one, if it returns more you know you have duplicate content.
Of course, this is a more long-winded process, but can be useful if you only have a very small site.
Does the content have links that contain:
http://mysite.com/article1 and also
Does your system refer to your site as:
And are there links to the same content using both versions? If so you are creating duplicate content.
Systems like WordPress offer the option to paginate comments. This avoids displaying very large pages with possibly hundreds of comments at the bottom of each article. Each page has its own URL such as:
These are examples of multiple URLs to the same piece of content. Therefore, creating a duplicate content scenario.
Session IDs are very useful for allowing a website to remember a visitor & dynamic actions they took on your site. For example, it can refer to a shopping cart that contains all the products the user wants to purchase. As the user navigates around the site, that unique session ID is tagged onto the URL of each page visited. Meaning a brand-new URL is created for each page. Once again creating duplicate content.
In this case Cookies provide a better approach as search engines never see them. But we will get into the fixes later.
Some systems offer printer-friendly pages as an option. Any link on the website to a printer-friendly version is picked up by the search engines. This causes them to detect duplicate content.
A developer will view a piece of content as a record in a database, with a unique reference number. But this isn't how a search engine views this content. The website software may spawn multiple URLs that link to the same piece of content in different ways. Search engines detect that there are multiple unique URLs that retrieve the same content. Therefore, indicating it may be duplicate content.
In this case, you would need to inform your developers to make sure that for every unique URL there is no duplicate content, no exceptions.
When a system uses parameters in the URL to identify a piece of content in the database, those parameters can often be constructed in different ways, for the same content.
“/?id=1&cat=2” might refer to a unique article, but so does:
/?cat=2&id=1 (cat = Category, ID = unique database reference).
A search engine sees two different links to the same piece of content. For this issue, Google has a special Parameter Handling tool where you can indicate how to handle parameters like these.
A 301 redirect can be served up, by your webserver, to a user's browser, or a search engine crawler, when a specific URL is sought. It tells the user or search engine that the link address is outdated and indicates the new address. It’s the coding equivalent of redirecting mail when we move house!
A 301 redirect is most commonly used when you move from one website to another (e.g. a name change). But, it can also be used to redirect multiple URLs to one 'master URL'. This helps search engines keep their indexes up to date. And helps you avoid any duplicate content issues.
Some web systems let you set up redirects in the Admin settings. Older Linux systems require you to manually insert them in the .htaccess file. This is a more hands-on technical approach but it’s not too difficult to do.
A typical redirect entry might look something like this:
Redirect 301 /old-page.html /new-page.html
The word canonical means 'the authoritative URL' in this context. You nominate one URL as being the 'canonical' version for the search engines.
It’s a simple technical solution in theory but implementing it can be a little complex. However, it solves the problem of multiple URLs pointing to the same content. It also improves your site’s SEO and has the same effect as 301 redirects without redirecting anything. Think of it as a ‘soft 301 redirect’.
Example of a canonical tag:
<link rel="canonical" href="https://mysite.com/my-article/" />"
The rel attribute in HTML specifies the relationship to the linked document and must be accompanied by the href attribute.
Most sites have a footer that is repeated at the bottom of each page. It’s not a good idea to place a lot of content here. Instead, link to a page that summarizes all the things you want users to know. This avoids text being repeated across multiple pages, needlessly.
Sometimes you may have very similar content across several pages. For example, several similar products in a range. Where possible, it's always best to consolidate as much as you can into one single page. Or focus on changing the copy of each product, so that it's different enough from the rest, whilst still conveying the meaning.
This may be a lot of effort, but is worth it in the end to avoid duplicate content issues.