Wordpress Duplicate Content: You Control the Horizontal, You Control the Vertical
Okay, so, all you hear is Wordpress = easy to have duplicate content. And it is. It's quite true. In fact, that may have bitten me on the arse on Needcoffee a few months back.
An individual post can appear:
- As itself
- In a category (as many as it has categories assigned to it)
- In a tag page (as many as it has tags assigned to it)
So if I've got a review of a TV DVD set, it could appear in the categories: TV, DVD and Reviews. Plus if I've got five tags on it, well…you see the concern. Without even meaning to, I've got the same article showing up nine places.
What to do? Well, initially I went into my robots.txt file and just told the search engines to stay the hell away from everything but my individual posts:
Disallow: /tag/
Disallow: /category/
Then the pendulum swings back the other way–are all of my pages close enough to my front page that they get nabbed by Google and are considered important? Basically, if nobody externally is linking to a post, and it's too many hops/links away from the front page, it can be abandoned and left for dead. Or something equally dramatic.
So the happy medium is to have a post, split it up with the MORE tag, and then allow categories and individual posts in the robots.txt file, but then only have one category per post.
Ah, so: problem for me. I can have a post, as stated above, that's for a TV DVD set. Now I can see saying that DVD can be chucked because primarily it's a TV item, the media just happens to be DVD. Fine. But, well, it is a review. And what if somebody wants to browse all our reviews? Do I really want them to have to use the tag?
Then it struck me like a slice of provolone from the blue: "What a dumbass, just exclude the individual categories that you know are always going to be tied to something else." For example, we will always have Reviews AND TV. Or Reviews AND Movies. Just exclude that category. So I re-allowed /category/ and instead just did this:
Disallow: /category/reviews/
Disallow: /category/press/
Done deal. Now, granted, I have some posts with multiple categories that need to be cleaned up, but that can be done easily enough.
I wish there were a lot more SEO tools built in with Wordpress, honestly. And maybe this exists among the fifty gajillion plugins and I haven't seen it, but a way of looking at all my posts and being able to check/uncheck categories en masse would be nice. Or even a plugin that went out, looked at how you've got your posts, robots.txt and such setup, and graded you for duplicate content. You know, you are at a 56% chance of being SOL because you've got too little content too many places. Something like that.
Anyway, What Have We Learned?
1. There are no absolute, hard and fast rules to SEO. And even that, being a hard and fast rule, is subject to scrutiny. Sure, you need to do stuff like use decent titles, decent URLs, and have your server, you know, actually online. There's some no brainers, but just because you find a post that says Your Robots.TXT Must Look Like This or You Are Doomed, well, have your grain of salt handy. Make sure what you're doing works for your individual site, because as I find, anyway, most SEO posts are for sites that aren't, shall we say, dealing in the trade of pop culture.
2. You can control a shitload of stuff about your site. I'm amazed at how many people actually don't have a robots.txt file. Or an .htaccess file (that they know of). I keep forgetting just how much power I have to shape what goes on on the site. It's a good idea to take five minutes a week and step back from the grind of posting and just go, "Right. Do I have my hatches battened down?"
3. Google Webmaster Tools are your friend. The robots.txt analyzer they provide has already saved me from fifteen really stupid things I could have done to cut my site off from the outside world. I highly recommend you do not make any changes to your robots.txt without running it through their first. And don't just check the Googlebot. Check the image-bot and check the media-bot (if you're running AdSense).
