Wordpress Duplicate Content: You Control the Horizontal, You Control the Vertical
Posted on 08.29.07 by Widge @ 9:09 am

Okay, so, all you hear is Wordpress = easy to have duplicate content. And it is. It's quite true. In fact, that may have bitten me on the arse on Needcoffee a few months back.

An individual post can appear:

  • As itself
  • In a category (as many as it has categories assigned to it)
  • In a tag page (as many as it has tags assigned to it)

So if I've got a review of a TV DVD set, it could appear in the categories: TV, DVD and Reviews. Plus if I've got five tags on it, well…you see the concern. Without even meaning to, I've got the same article showing up nine places.

What to do? Well, initially I went into my robots.txt file and just told the search engines to stay the hell away from everything but my individual posts:

Disallow: /tag/
Disallow: /category/

Then the pendulum swings back the other way–are all of my pages close enough to my front page that they get nabbed by Google and are considered important? Basically, if nobody externally is linking to a post, and it's too many hops/links away from the front page, it can be abandoned and left for dead. Or something equally dramatic.

So the happy medium is to have a post, split it up with the MORE tag, and then allow categories and individual posts in the robots.txt file, but then only have one category per post.

Ah, so: problem for me. I can have a post, as stated above, that's for a TV DVD set. Now I can see saying that DVD can be chucked because primarily it's a TV item, the media just happens to be DVD. Fine. But, well, it is a review. And what if somebody wants to browse all our reviews? Do I really want them to have to use the tag?

Then it struck me like a slice of provolone from the blue: "What a dumbass, just exclude the individual categories that you know are always going to be tied to something else." For example, we will always have Reviews AND TV. Or Reviews AND Movies. Just exclude that category. So I re-allowed /category/ and instead just did this:

Disallow: /category/reviews/
Disallow: /category/press/

Done deal. Now, granted, I have some posts with multiple categories that need to be cleaned up, but that can be done easily enough.

I wish there were a lot more SEO tools built in with Wordpress, honestly. And maybe this exists among the fifty gajillion plugins and I haven't seen it, but a way of looking at all my posts and being able to check/uncheck categories en masse would be nice. Or even a plugin that went out, looked at how you've got your posts, robots.txt and such setup, and graded you for duplicate content. You know, you are at a 56% chance of being SOL because you've got too little content too many places. Something like that.

Anyway, What Have We Learned?

1. There are no absolute, hard and fast rules to SEO. And even that, being a hard and fast rule, is subject to scrutiny. Sure, you need to do stuff like use decent titles, decent URLs, and have your server, you know, actually online. There's some no brainers, but just because you find a post that says Your Robots.TXT Must Look Like This or You Are Doomed, well, have your grain of salt handy. Make sure what you're doing works for your individual site, because as I find, anyway, most SEO posts are for sites that aren't, shall we say, dealing in the trade of pop culture.

2. You can control a shitload of stuff about your site. I'm amazed at how many people actually don't have a robots.txt file. Or an .htaccess file (that they know of). I keep forgetting just how much power I have to shape what goes on on the site. It's a good idea to take five minutes a week and step back from the grind of posting and just go, "Right. Do I have my hatches battened down?"

3. Google Webmaster Tools are your friend. The robots.txt analyzer they provide has already saved me from fifteen really stupid things I could have done to cut my site off from the outside world. I highly recommend you do not make any changes to your robots.txt without running it through their first. And don't just check the Googlebot. Check the image-bot and check the media-bot (if you're running AdSense).


Filed under: Fun With SEO
Comments:

« Gilbert Arenas Explains Why I Don't Go In The Water | Currently Wanted For Questioning Regarding Hit and Run Panel Attendance at DragonCon 2007… » 1 Comment »

  1. Google must realise that millions of people use WordPress and they don't create duplicate content for back hat SEO purposes. It's just the way blogs work.

    Anyways, I have recently re-designed my blog to get rid of content duplication once and for all.

    Nice tips BTW.

    Comment by Mohsin — September 2, 2007 @ 6:16 am

RSS feed for comments on this post. TrackBack URI

Leave a comment

Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

(required)

(required)



John Robinson is a writer of prose, poetry and comics who also writes under the pseudonym of Widgett Walls.

Widgett Walls is the director of Needcoffee.com who also writes under the pseudonym of John Robinson.

Don't ask.


This is my latest book. Short stories written especially for you, or at least someone who reminded me a lot of you at the time.

Read it for free here. Or if you like paper, buy it here.

Then tell all your friends about it. Or all your enemies. I'm not particular either way.


Want a translation? Try these:

They're not perfect, but they'll do in a pinch.







Syndication
Feedburner

Amigos
Sarah Brown
Catalyst
Dindrane
Doc Ezra
Tee Quillin
ScottC
Tibby's Bowl
JM Tuffley

Sites Which Distract Me From Writing
Boing Boing
Cringely
Defamer
Warren Ellis
Engadget
Fortean Times
Long Tail
Porphyre
Reason
Wired

Topics
General BS
Insomnia
Travel
Writing Fodder

Active Projects...
Dark Blue Monstropolis
Magnificent Desolation
Something Else
The Sunday Before You


Recent Entries
  • And Now That I've Upgraded to 2.6...
  • Don't Mind Me.
  • Amazon MAB Replacement?
  • My New Mascot
  • A Nice Coda to the Trip
  • The New Yorker Hotel Business Center
  • Blast From the Past
  • Crossposted From My StumbleUpon Blog
  • Update at Last
  • George Clooney Makes Small Films Profitable. Yes.
  • On the Other Side of the Flu
  • Piano and Trumpet For the Win
  • Gun, With Occasional Weightlessness
  • Tor Nørretranders on Permanent Reincarnation
  • Can I Get a Hell Yeah?

  • Wordpress Archives
    July 2008
    May 2008
    April 2008
    March 2008
    February 2008
    January 2008
    December 2007
    November 2007
    October 2007
    September 2007
    August 2007
    July 2007
    June 2007
    May 2007
    April 2007
    March 2007
    February 2007
    January 2007
    December 2006
    November 2006
    October 2006
    September 2006
    August 2006
    July 2006
    June 2006
    May 2006
    April 2006
    March 2006
    February 2006
    January 2006
    December 2005
    November 2005
    October 2005
    September 2005
    August 2005
    July 2005
    June 2005
    May 2005
    April 2005
    March 2005

    Credits and Copyright
    Proudly powered by WordPress. All content © 1997-present by John Robinson.
    Theme by Theron Parlin, but we've mangled it beyond all reason. So don't blame him.