Related insights

How much value does your website offer users? Do you deliver information or an experience that can’t be found anywhere else?

In order to attract users themselves, Google and other search engines know that they must serve up the highest quality and most relevant results to any given query. Their bots scour the web, creating a complete index of all websites and webpages. Then they analyse all this information in a way that allows them to present the best results as soon as a search is conducted.

But what if two sites or pages have the same or very similar content on them? How does a search engine know which to rank above the other?

It’s time to talk about the problem of Google and duplicate content: what it is, the issues it causes, how it happens and how to fix it.

What is duplicate content?

Duplicate website content is exactly what it says on the packet. It is content that is not unique, but is instead copied from (or overtly inspired by) another website or webpage.

You might wonder to yourself how hard it is to write an original sentence. The answer: not very. At the time of writing, “this exact sentence has never been written on the internet” (until now.)

Duplicate content can be found on different pages within the same website, or across different websites. Sometimes it is a result of laziness – a website owner copying and pasting the same text across multiple pages. Sometimes it’s due to a technical glitch, like a website creating unnecessary copies of particular pages. It can also have more nefarious origins – someone scraping content from your website and presenting it as their own.

Duplicate content on a website doesn’t have to be an exact match either. Google is smart enough to realise that “A fast brown fox jumps over a sleepy dog” looks suspiciously like a rewrite of “A quick brown fox jumps over a lazy dog.” In Google’s words, these two sentences are ‘appreciably similar’.

No search engine will reveal exactly what their definition of duplicate content is. This is because they don’t want people to use that information to get around their content controls. If you know the rules, you can find ways to break them.

Duplicate content isn’t penalised by Google directly (as many ‘experts’ continue to believe.) It does however make things more difficult for the search engine. This in turn makes things more difficult for the websites that feature the duplicate material.

Why is duplicate content an issue for search engines?

If there’s no specific penalty for duplicate website content, what exactly is the issue? Well, duplicate content simply makes Google’s quest to serve up the most relevant results far more difficult.

If Google’s bots find two pieces of identical content, it can be difficult for them to know which is the original and which is the copy. This means that they don’t know which to include and exclude from their search results, and which should rank higher than the other. Search engines also don’t want to serve up to two pieces of identical content to their users, for obvious reasons. It’s far from the relevant and valuable search experience they want to offer.

In essence, this means that Google can’t trust web pages with duplicate content. For this reason, they tend to punish both web pages by pushing them far down the results page.

Why is duplicate content an issue for website owners?

The issues that duplicate content presents for search engines flow onto the affected websites. If your site features duplicate content, you’re far less likely to get onto page one of Google for relevant search terms. It doesn’t matter how amazing your other search engine optimisation (SEO) efforts have been.

How does duplicate content affect SEO? Google doesn’t know whether to direct all those valuable link metrics, like authority, trust and link equity, to what it deems to be the original page, or whether to spread these metrics across multiple pages (diluting their value in the process.)

Link equity is a key consideration when discussing duplicate content. Let’s say that you’ve written a super valuable and super popular guide. This guide is then scraped from your site and uploaded elsewhere. If people begin linking to the copy, rather than your original, you lose out. You are denied all the super-valuable equity that could have taken you to the top spot on the search engine results page (SERP).

Duplicate content mightn’t result in a direct penalty for a website owner, but it can greatly dilute your online presence.

Why do duplicate content issues happen?

Duplicate content is super common. According to this 2015 audit of almost a million websites, 29% of pages featured duplicate content.

How exactly do all these duplicate content issues arise? A few of the most common reasons include:

Copying or scraping

Creating content is hard. Writing a whole website from scratch demands a real investment of time, effort, money and talent. This can result in copying from other websites, as writers try to work more efficiently.

Sometimes this is somewhat unintentional: someone looks at another website for inspiration, but takes a little too much, and ends up creating what Google deems to be duplicate content. Other times this is very intentional, with someone scraping content from a website before uploading it as their own.

Automated URL creation

Google defines duplicate content as identical or appreciably similar content found at more than one URL. Sometimes this isn’t intentional, however – it might be a total accident.

Let’s say you sell handmade pottery on your eCommerce website, Meg’s Mugs, through a platform like Square or Shopify. You offer your most popular mug in six different glazes. Without you realising, the platform has created a different URL for every finish. This means that the content on each of these pages will be flagged as duplicate.

URL variations

A similar problem can occur for variations of your URL.

Let’s say you’ve been wise enough to secure each of the following URLs:


This won’t create a duplicate content issue, provided you pick a primary URL and redirect all the others to it. But sometimes the same site is replicated for each URL, resulting in some serious duplicate content issues.

How to identify duplicate content

At this point, you might be wondering whether your website features duplicate content. Happily, there are a number of tools that can help you to find out exactly that.

  • SemrushThis tool allows you to check for duplicate content within your own site. Enter your URL and receive a comprehensive report that identifies current and potential content issues within your site.
  • CopyscapeHave you had content scraped or stolen by someone else? Have you unknowingly copied or created appreciably similar content to another site? Copyscape can scour the web for duplicate content that lays beyond your own URL.
  • GrammarlyWhen you’re writing content it’s wise to lean on a plug-in like Grammarly which can check for plagiarism while you write.

How to avoid and fix duplicate content

The first step in avoiding the creation of duplicate content is to understand the problem. Having read this far, hopefully you now do!

Avoiding duplication is about investing in web design and content creation expertise. When creating new content or briefing a writer, you should never use one example or offer one piece of inspiration. It’s best to come up with the content from scratch. But when inspiration is needed, get it from a wide variety of sources.

What if you have a duplicate content issue? How do you fix it? The specific solution will depend on the problem, but most revolve around the same concept: pointing Google to the primary piece of content.

To this end, a few of the most common fixes include:

  • Setting your preferred domain: If you have multiple domains, you can use the ‘preferred domain’ function to tell Google which to index.
  • Redirecting to the primary page: Going back to our example of the different coloured mugs, this problem can be solved by using a ‘301 redirect’ to point Google to a single page.
  • Rel=canonical attribute: An attribute to apply to all the non-primary pages, this tells Google that a page is a purposeful copy of another, and that all of the metrics should be passed onto the nominated primary page.
  • Non-index labelling: By including the meta-tag into the section of your website. This way, you can exclude purposefully duplicated pages from the Google index, thereby removing their potential to cause harm to your SEO efforts.

Content is critical to developing your online presence and to work your way up the SERP. It also takes time and skill to create original and truly effective content. For these reasons and more, many Kiwi businesses hand the responsibility for content creation and SEO over to experts.

Share this article