A lot has been made of the recent post at Google’s Webmaster Central Blog regarding duplicate content. The post, entitled Demystifying the “Duplicate Content Penalty”, basically calls into question the idea that a site can be “penalized” for having significantly similar content to another site and places the blame on webmasters and SEOs for perpetuating “myth of duplicate content”. Whether you refer to it as a “penalty” or simply “filter” the results, the outcome is ultimately the same – one URL will be considered the “preferred” URL by Google and THAT will be the URL that is included within SERPs. The real question then becomes how to help Google identify your site as “preferred”.
Calling it a “penalty” vs. a “filter” is nothing more than semantics. Stating that there is no such thing as a “duplicate content penalty” is Google’s feeble attempt to make it seem as though they aren’t unfairly penalizing individual URLs in their self-appointed role policing the Internet. The fact of the matter is that it doesn’t matter whether a page suffers from an actual “penalty” or not. If it is “filtered out” of the results for having content that is too similar to another page already listed in the index, does it really matter if it’s technically referred to as a “penalty” or a “filter”? The ultimate result is the same – people will be able to more easily find one page than another. Period.
I completely understand Google’s goal in eliminating duplicate content from their search results. Imagine what SERPs would look like, taking into consideration the number of cookie-cutter, affiliate websites in existence today, if duplicate content weren’t factored into a site’s ranking? Part of any search engine’s goal, Google included, is to provide its users with relevant and unique information. If searching for the term “weight loss supplements” resulted 100 identical Herbalife sites, each with a different URL and the exact same information, how reliable would you believe those results to be? There is a reason duplicate content is frowned upon…and it should be.
It appears that Google, at least in this latest official post, is referring specifically to inadvertent and non-malicious duplicate content within a given site. I have suspected for some time that this sort of duplicate content has a low level of impact on a site’s ranking. Think, for example, about the number of WordPress sites that rank highly for competitive search terms. Many of those sites contain sidebars that are identical on every page. If that sort of duplicate content were penalized…or, uh…excuse me, if that sort of duplicate content were to trigger Google’s “filters”, that would likely have a negative impact on the ranking of each page. That hasn’t been my experience.
What about duplicate content from other sites (i.e. scraped content)? How does Google determine whether or not a site that scrapes content is malicious? In my opinion, stealing my intellectual property and regurgitating it on your own site as if you created it IS malicious under any circumstances and should be penalized accordingly. Google apparently doesn’t agree, as illustrated by a number of documented experiences with scraper sites outranking the original source due in large part to having a higher PR than the site where the information was originally published.
Google doesn’t do as good a job at this as they’d like to believe. Many a webmaster, SEO and search marketing guru has weighed in on the issue and the consensus, at least from what I’ve read, is pretty clear: duplicate content, regardless of its source, can have a negative impact on ranking and DON’T LET GOOGLE FIGURE STUFF OUT ON THEIR OWN! These posts seem to reiterate that it is important to put forth a modicum of effort to address what is within your control when it comes to the issue of duplicate content within your own site:
I mean, seriously – are all of these experts full of crap? Matt Cutts included? I’d venture to say that they’re not…and preventing duplicate content issues from occuring in the first place is likely the best option for anyone concerned about driving traffic to a site via Google’s organic search results. As for duplicate content from another site…I think we all know there is little, if anything, that can truly be done about that and we have no choice but to leave it to Google to sort out those issues on their own.
Google likes to toot their own horn and proclaim that they’ve “got it covered”, but the fact of the matter is that the more you leave to Google’s bots to “figure out” the more time must be spent doing so…meaning greater resources and server load. Why? Why not just make things as clean and simple as possible? Does this recent clarification by Google mean that we should no longer concern ourselves with resolving a site’s canonical homepage issues? Does it mean that it’s no longer necessary to restrict bot access to printer friendly versions of pages? Does it mean that multiple URLs resolving with the same content will be ignored, but have no negative impact on the ranking of the preferred URL? No…I don’t think their claim to have a handle on duplicate content issues means any of that.
What it does mean is that Google is, as always, making every effort to improve the experience for its users – with or without consideration for the impact to webmasters. They will likely always error on the side of caution when it comes to preventing SPAM and other malicious activity within search results. To a certain extent, that is for the good of all…provided that Google recognizes the disconnect between what they claim to be true vs. the real-world experiences of many SEOs in dealing with duplicate content issues to this point.
If duplicate content wasn’t factored into ranking, the SERPs would be dominated by a large number of identical affiliate sites and little else in some queries. “Penalty” or not, there is a definitive reason that a search for “Herbalife” doesn’t result in finding Herbalife’s thousands of identical, cookie-cutter affiliate websites…and I believe at least part of that reason to be duplicate content. Don’t piss on my leg and then tell me it’s raining, Google! Give me a break!