Duplicate content has been discussed and fought over on almost all panels I’ve attended.
SEO’s seems to like the subject very much, I have no clue as to why.

We’ve discussed  about duplicate content issues here.

Today Google, Yahoo and MSN have come up with a new “idea” to help webmasters fight duplicate content.


First off – How does duplicate content occur on a website ?

- When more than one page has the same content.

- When more than one page are similar in content.

- When a page is repeated on the website due to technical glitches.

- When dynamically generated pages repeat the same content over various events.

So in such events, the search engines, on seeing the same content on diff pages, “suspends” the value of those pages, and takes time before it shows up either one of those pages on the search results page for a live search.

Ex:- Lets say we have two pages on a website.

URL 1 – http://yoursite.com/yourpage

URL 2 – http://yoursite.com/yourpage?bgcolor=blue

Let’s assume that the second page is the same as the first page except for the background color which is dynamically controlled on CSS styles.

Now, when there are references to the two URLs from another or more website, with similar or same anchor texts, Google will find it difficult to decide which page to come up with.

In such situations, Google might take its own time to decide which page to show up on the search engine listings for a related search. It’s more like a confused state. (Bots aren’t always smart you see.)

So that explains why a website should contain minimum duplicate entries or duplicate content.

It might not be possible to completely avoid duplicate content on a website, but the idea is to curb it to the minimum causing the least confusion.


How to curb duplicate content ?

In the above example, there are more than one way of telling Google that one page is better than the other.

1 – Google has its own calculations that it does to analyze the content, and come to a decision as to which page makes more sense.

2 – Google can also check for external factors such as incoming links, anchor texts, contextual content on the links etc and decide as to which page among the two are more “popular” or “preferred”.


How do Search Engines deal with Duplicate content ?

Search Engines takes their own time until they get evidence of why a page is better than the other before they actually display them on the live search results. They would simply carry on with the other results in the queue and suspend the “possible duplicate content” from being displayed on the live results.

So what is a Canonical tag ? How does it help in dealing with duplicate content ?

A canonical tag is a simple piece of HTML code (<link>) that you insert into the <head> section of a duplicate page, letting the search engines know that they are on a duplicate page and they need to find the original content elsewhere, and guide them there.

So let’s pick an example.

Page 1 -  http://www.google.com/duplicate-content.html   (Original source content)

Page 2 - http://www.google.com/duplicate-content-800×600.html   (Duplicate content)

Now, you add the canonical link tag to the duplicate page, Page 2.

<link rel="canonical" href="http://www.google.com/duplicate-content.html " />

So what happens now ? As soon as Google bots land on the duplicate page (page with the canonical tag), it does not give weight age to the content on that page, rather follows the original URL in the canonical tag code.

Where/Which pages should you add a canonical tag?

Technically , any page that you think will loop the content from a different page.
For example – http://www.yoursite.com/page1.php?sessionid=12+author=ben should be canonically tagged to http://www.yoursite.com/page1.php.

How does Canonical Tag help Wordpress blogs ?

In my opinion, canonical tags should not be automated on Wordpress blogs. Because although there are several occurrences of possible duplicate content on Wordpress, the canonical tags may not work there efficiently as they require some amount of manual checks.

For example, on Wordpress blogs, tags and archives creates a possible duplicate content situation, but not either can be effectively controlled by canonical tags. In such situations, meta noindex tags are far more effective.

But in instances like series posts (101-tips-part1.html and 101-tips-part2.html) , where IF the content are strikingly similar, one may manually insert the canonical tags to good use.

Otherwise, I’d stay away from automation at least for now.

Related SEO Tips and Articles:

Find more SEO tips, Blogging Tips and Hacks
  1. Fantastic Mani, thanks. On our site we often have to duplicate content to fit in with how users logically would search through menus, it’s great to see that there we can still keep user friendliess without harming our rankings!!!

    Reply to this comment

  2. Duplicate content within a site are discounted nowadays.

    For Wordpress, the plugin HeadSpace2 has no other matches in terms of SEO it provides. You can configure tags, archives, categories, subpages, login pages, author pages or anything you want.

    Reply to this comment

  3. Cool… Has been hearing a lot on these canonical tags and this is an eye opener.

    Thanks for the info

    Reply to this comment

  4. I think the canonical tag will prevent search engines from getting confused. But hopefully they have put safeguards in place to prevent spamming. I’m afraid that spammers could use such a new convention as this to their advantage.

    Reply to this comment

  5. It will certainly have a pretty huge impact, most profit from it will be for script based sites and people running affiliate programs with IDs at the end of the URL…looks a bit like Noindex 2.0.

    Reply to this comment

  6. Mani,

    The All in One SEO Plugin can add a no-index to the category and tag pages.. and then what is problem with automation of canonical tag? Would a combination of noindex via the All in One SEO pack and a auto-canonical tag solve the issue of duplicate content to a good extend?

    Recently, I found my search results page coming up in Google results, will the canonical tag solve such issues too?

    see this link:

    Reply to this comment

  7. Mani,

    users can use the plugin available @ http://yoast.com/canonical-url-links/ for curbing duplicates using canonical tag automatically.

    It supports wordpress and also available for other CMS aswell.

    Reply to this comment

  8. Hi Mani.

    Nice Post with Well Explanation.
    I read this news on searchengineland but was confused to understand what is this and what blog witter want to say, but you solved all problems about it.

    Nice Post. You are really posting nice information.

    Reply to this comment

  9. [...] then came the canonical tags for duplicate content from Google (and others of course) [...]

  10. What is different using “canonical” and using “noindex meta tag” in duplicate pages?

    Reply to this comment