Wordpress Duplicate Content: You Control the Horizontal, You Control the Vertical
Posted on 08.29.07 by Widge @ 9:09 am

Okay, so, all you hear is Wordpress = easy to have duplicate content. And it is. It's quite true. In fact, that may have bitten me on the arse on Needcoffee a few months back.

An individual post can appear:

  • As itself
  • In a category (as many as it has categories assigned to it)
  • In a tag page (as many as it has tags assigned to it)

So if I've got a review of a TV DVD set, it could appear in the categories: TV, DVD and Reviews. Plus if I've got five tags on it, well…you see the concern. Without even meaning to, I've got the same article showing up nine places.

What to do? Well, initially I went into my robots.txt file and just told the search engines to stay the hell away from everything but my individual posts:

Disallow: /tag/
Disallow: /category/

Then the pendulum swings back the other way–are all of my pages close enough to my front page that they get nabbed by Google and are considered important? Basically, if nobody externally is linking to a post, and it's too many hops/links away from the front page, it can be abandoned and left for dead. Or something equally dramatic.

So the happy medium is to have a post, split it up with the MORE tag, and then allow categories and individual posts in the robots.txt file, but then only have one category per post.

Ah, so: problem for me. I can have a post, as stated above, that's for a TV DVD set. Now I can see saying that DVD can be chucked because primarily it's a TV item, the media just happens to be DVD. Fine. But, well, it is a review. And what if somebody wants to browse all our reviews? Do I really want them to have to use the tag?

Then it struck me like a slice of provolone from the blue: "What a dumbass, just exclude the individual categories that you know are always going to be tied to something else." For example, we will always have Reviews AND TV. Or Reviews AND Movies. Just exclude that category. So I re-allowed /category/ and instead just did this:

Disallow: /category/reviews/
Disallow: /category/press/

Done deal. Now, granted, I have some posts with multiple categories that need to be cleaned up, but that can be done easily enough.

I wish there were a lot more SEO tools built in with Wordpress, honestly. And maybe this exists among the fifty gajillion plugins and I haven't seen it, but a way of looking at all my posts and being able to check/uncheck categories en masse would be nice. Or even a plugin that went out, looked at how you've got your posts, robots.txt and such setup, and graded you for duplicate content. You know, you are at a 56% chance of being SOL because you've got too little content too many places. Something like that.

Anyway, What Have We Learned?

1. There are no absolute, hard and fast rules to SEO. And even that, being a hard and fast rule, is subject to scrutiny. Sure, you need to do stuff like use decent titles, decent URLs, and have your server, you know, actually online. There's some no brainers, but just because you find a post that says Your Robots.TXT Must Look Like This or You Are Doomed, well, have your grain of salt handy. Make sure what you're doing works for your individual site, because as I find, anyway, most SEO posts are for sites that aren't, shall we say, dealing in the trade of pop culture.

2. You can control a shitload of stuff about your site. I'm amazed at how many people actually don't have a robots.txt file. Or an .htaccess file (that they know of). I keep forgetting just how much power I have to shape what goes on on the site. It's a good idea to take five minutes a week and step back from the grind of posting and just go, "Right. Do I have my hatches battened down?"

3. Google Webmaster Tools are your friend. The robots.txt analyzer they provide has already saved me from fifteen really stupid things I could have done to cut my site off from the outside world. I highly recommend you do not make any changes to your robots.txt without running it through their first. And don't just check the Googlebot. Check the image-bot and check the media-bot (if you're running AdSense).

Filed under: Fun With SEO
Comments: 1 Comment


Gilbert Arenas Explains Why I Don't Go In The Water
Posted on 08.28.07 by Widge @ 11:15 pm

Right here. Scroll down to the bit about sharks.

A shark attack is if you're chilling at home, sitting on your couch, and a shark comes in and bites you; now that's a shark attack. Now, if you're chilling in the water, that is called invasion of space. So I have never heard of a shark attack.

Well put. Found via Cephalopodcast.

Filed under: General BS
Comments: None


Attack the Gas Station! is the Best Movie Title Ever
Posted on 08.28.07 by Widge @ 2:19 pm

And what's the only thing better than listening to tracks from the soundtrack?

Listening to the techno versions of same.

No shit.

Filed under: General BS
Comments: None


Exabot-Thumbnails?
Posted on 08.15.07 by Widge @ 11:41 pm

Received a 500 error earlier this morning trying to post on Needcoffee. Checking the access logs it appears I have a new friend: Exabot-Thumbnails. Here's a sample line:

193.47.80.77 - - [14/Aug/2007:01:25:22 -0700] "GET /updates/tag/androids HTTP/1.0" 301 242 "-" "Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (like Gecko) (Exabot-Thumbnails)"

Apparently there's been some other Exabots in the past, but nothing that's specifically "Exabot-Thumbnails." There's no info on a site to check for who the hell owns this bot, nor, from what I can tell, is it nabbing my robots.txt file. So…evil bot = .htaccess smackdown by IP address.

If anybody knows anything more about this, let me know.

Filed under: Fun With SEO
Comments: 1 Comment


A Caffiend's Prayer
Posted on 08.10.07 by Widge @ 7:56 am

O Caffeine, take me into your loving arms for one more day
Let thy quad venti delivery mechanism
Enter unto me like a divine wellspring
Of eyelids not closing at stoplights.

Let the extra 400mg of you I ingest
Bring reinforcements to your troops
Doing battle against the hordes of fatigue
Seeking to usurp myself and my productivity.

If I should succumb to sleep
If I should descend and then not awaken once more
Let me find myself across the espresso river
Let me have paid the boatman in beans
Let me find unrest at last
In a paradise filled with Red Bull trees
And fountains of Surge and Josta
And all those other drinks that have gone before us.

Let us make little cream and foam hearts
On the surface of eternity.

And then
Let us drink.

Filed under: Insomnia and Writing Fodder
Comments: 5 Comments


Gmail
Posted on 08.08.07 by Widge @ 6:53 pm

I have to say, I really appreciate Gmail. I have one account that I use for filtering spam and another account I use for reading material on my phone. Anytime I find a slightly long article that I'd like to read, I just send it to myself on my mobile Gmail account .

I've even stopped using the mobile version of the Gmail browser client. It doesn't support pictures in messages and besides, my connection is fast enough that the regular version works just fine.

Hell, I'd use Gmail for all my mail stuff if it didn't have that obnoxious tendency to bind emails together in a conversation that cannot be undone.

Maybe one of these days.

Sent from phone.

Filed under: General BS
Comments: None


Television And Why I Don't Watch It
Posted on 08.04.07 by Widge @ 5:23 pm

Here you go.

I can't tell you the last time I watched anything on television live. If anything, I watch a bit during dinner, but that's usually British television and it can take me three dinners to get through an episode of The F Word. I'll finish eating, watch to the next commercial break, stop and come back to it.

I used to watch Law & Order, but eventually ran out of time for even that. So.

Filed under: General BS
Comments: None


I'm in Supplemental Hell, I Just Don't Know It Now: All Better
Posted on 08.01.07 by Widge @ 2:12 am

Well, this is frustrating.

For those who don't know (and probably don't care, if you're not a webmaster), there is a secondary set of search results you can get from Google. It's called Supplemental Results. It might as well be called "The Results That Aren't As Good As The Real Results." Nobody but nobody wants to be in them.

A couple of months ago, I noticed that a goodly number of Needcoffee's entries had wound up in the Supplemental Results. At first, it appeared that this was because we had a lot of duplicate content: tag pages, category pages, date pages–all with the same posts. All right, fair enough–I setup a robots.txt that kept the Googlebot from indexing pages that I didn't want, and kept single entries as indexable.

However, stuff continues to slide into Supplemental Results. Right now I was toying with internal links to try and get things under control, but basically Google has effectively blinded me to how well I'm doing. The name of the article should have been more properly called "Supplemental Goes Stealth."

This doesn't fix anything. In fact, it makes my job as a webmaster even more difficult.

It would be one thing if there was a webmaster tool that said, "Hey, Widge, here's what's wrong with your page and why it slid into Supplemental Hell." Then I would go and fix it. However, now I not only don't know why this is happening I can't even see it happening any longer. So the problem has just gotten a lot worse. Google's solution to the problem is simply to make it impossible to see the problem. But the problem hasn't gone away.

This, frankly, sucks. And this is me, Google enthusiast and defender, talking here. Why is Google doing this? I run AdSense on Needcoffee. Why would they make it harder for people to find pages on my site and thus harder to get at the ad revenue that I could potentially bring in? And this is not just my site–AdSense is all over the place, and this affects everybody's sites. It would be in Google's best interests, I would think, to provide us with the tools so we can make our sites work better with their search engine, so everybody wins. Again, I'm not one of this whiny assholes who thinks Google owes me this–they owe me jack crap. It's just hard to understand why they would respond to a problem by, instead of using their vaunted resources to throw at it, to make it look like it's gone away and hope nobody bitches.

Somebody help me understand how this is a good idea.

Filed under: Fun With SEO
Comments: None


John Robinson is a writer of prose, poetry and comics who also writes under the pseudonym of Widgett Walls.

Widgett Walls is the director of Needcoffee.com who also writes under the pseudonym of John Robinson.

Don't ask.


This is my latest book. Short stories written especially for you, or at least someone who reminded me a lot of you at the time.

Read it for free here. Or if you like paper, buy it here.

Then tell all your friends about it. Or all your enemies. I'm not particular either way.


Want a translation? Try these:

They're not perfect, but they'll do in a pinch.







Syndication
Feedburner

Amigos
Sarah Brown
Catalyst
Dindrane
Doc Ezra
Tee Quillin
ScottC
Tibby's Bowl
JM Tuffley

Sites Which Distract Me From Writing
Boing Boing
Cringely
Defamer
Warren Ellis
Engadget
Fortean Times
Long Tail
Porphyre
Reason
Wired

Topics
General BS
Insomnia
Travel
Writing Fodder

Active Projects...
Dark Blue Monstropolis
Magnificent Desolation
Something Else
The Sunday Before You


Recent Entries
  • And Now That I've Upgraded to 2.6...
  • Don't Mind Me.
  • Amazon MAB Replacement?
  • My New Mascot
  • A Nice Coda to the Trip
  • The New Yorker Hotel Business Center
  • Blast From the Past
  • Crossposted From My StumbleUpon Blog
  • Update at Last
  • George Clooney Makes Small Films Profitable. Yes.
  • On the Other Side of the Flu
  • Piano and Trumpet For the Win
  • Gun, With Occasional Weightlessness
  • Tor Nørretranders on Permanent Reincarnation
  • Can I Get a Hell Yeah?

  • Wordpress Archives
    July 2008
    May 2008
    April 2008
    March 2008
    February 2008
    January 2008
    December 2007
    November 2007
    October 2007
    September 2007
    August 2007
    July 2007
    June 2007
    May 2007
    April 2007
    March 2007
    February 2007
    January 2007
    December 2006
    November 2006
    October 2006
    September 2006
    August 2006
    July 2006
    June 2006
    May 2006
    April 2006
    March 2006
    February 2006
    January 2006
    December 2005
    November 2005
    October 2005
    September 2005
    August 2005
    July 2005
    June 2005
    May 2005
    April 2005
    March 2005

    Credits and Copyright
    Proudly powered by WordPress. All content © 1997-present by John Robinson.
    Theme by Theron Parlin, but we've mangled it beyond all reason. So don't blame him.