Dec
14
2007
--

Redirecting .html to .htm via .htaccess

Okay, so here's some fun. Make yourself comfortable.

I went out to find an image for Needcoffee from the archives we've already got on the site so I wouldn't have to grab it from elsewhere (remember, kids: recycle!) and realized that some of our archival pages weren't loading properly on the new server.

Brief explanation: on Version 3 of the site (pre-Wordpress) we used frames. We're not proud of it, but it made sense for 1998. The content portion of the site was an .htm file. The frame that went around it and contained the masthead and the menu/sidebar was an .html file. So you could have, for example, a DVD review of Drop Dead Fred and the review itself was ddfred.htm while the frame that held it was ddfred.html.

Again, it was the best we could muster at the time.

Anyway, while checking out the new server for this old image I wanted, I discovered that the .html versions of pages were coughing up a lung. "An error occurred while processing this directive." Lovely. But the .htm versions were fine. What to do?

Well, tear open a .html file and saw that AXS, the web tracking bit we used to use on the old site (and still use in the archives) wasn't working because it was pointing to an old CGI directory and the CGI directory wasn't kosher any longer.

So the first thing I tried to do was get the old CGI directory to work, but I couldn't seem to figure out how to do that. And I wanted to get the damn thinking working. So I decided, hey! Screw it, I'll just to a rewrite via the .htaccess and when people go to the .html version, I'll just send them to the direct review instead. That's actually better from an SEO perspective because back when we setup the whole framed system SEO wasn't even a twinkle in anybody's pants.

But somehow I botched it and the .htaccess blew up the site for a few minutes. I had a Redirect that never redirected anywhere, and it didn't matter that it was in my /html/dvd/ directory, the server read down through .htaccess, couldn't go any further and barfed.

Why did this happen, children? Because Uncle Widge fucked himself over. He got in a hurry and forgot the cardinal rule of screwing around with your site:

The Cardinal Rule of Screwing Around With Your Site:

Don't be a dick. Keep a backup and a fire extinguisher handy at all times.

That's right. When you forget the Cardinal Rule, Yahweh himself will laugh at you and your site will explode.

So. I finally got that corrected after a few minutes outage, and I was able to find some info on how to do this in .htaccess.

Not that I think anybody out there has the same setup as the old Needcoffee.com site, but still. You can futz with this to match your own site.

This is a variation on what I found at SEOBook, which was the best and closest to what I was trying to do.

Disclaimer:

When it comes to .htaccess rewriting, I barely understand what I'm doing. I admit that up front, so if you want tips on this, I may or may not have any clue as to what you're asking.

So.

I basically have an .htaccess in the subdirectory where I want to make this change.

I added this to the top:

Options +FollowSymlinks
RewriteEngine on
RewriteRule ^(.+)\.html http://www.needcoffee.com/html/$1.htm [r=301,nc]

Here's what I can tell you. The Options and RewriteEngine statement make the thing work to begin with. The rule itself says for every .html hit, rewrite and resend them instead to .htm. Also, it gives a 301 redirect, which is good because search engines need to be able to find the content that's been "moved," or in this case, simply bypassed.

The one thing I don't understand is, the way I read this code, it looks like it should only work for the /html/ directory, but it instead works in all subdirectories. So that I'm clueless about. Just for the moment it works.

There? Aren't you sorry you asked now?

Update: Shade at That's My Stapler tries to help my inept ass out. I think I almost understand what he's saying…I can do coding, me.

Written by Widge in: Fun With SEO | Tags: , , , , , , , ,
Nov
08
2007
3

Frame Breaker Breaks Google Images?

This is interesting. So Headspace2, the badass plugin that I'm using over on Needcoffee currently, has a Frame Breaker built into it.

WTF is a Frame Breaker?

Well, you know how when you search in Google Images, you get presented the site below a Google frame up top? The Frame Breaker breaks you out of that frame, natch, so you get served the site without the Google flavoring up top.

I always did wonder about that, so I flicked it on. That was around the 22nd or 23rd of October.

I decided to come back and check to see how it had affected my traffic on Google Images.

Now, let me state this up front. My understanding is that Google Images doesn't update very often. And I also understand that for the majority of my images, I haven't SEO'd them up worth a good goddamn because at the time I was putting them up, I had no idea why I should. (Of course, people still seem to find me and force me to do shit like this.)

So I don't have a great deal of traffic coming in anyway. But let's look.

Google Images spiked the day I turned it on, then went to a reasonable trickle.

As for Google Images.ca? I went from a trickle…to nothing.

Google Images.uk? From an erratic mess to…nothing.

And pretty much on down the line.

Did it do anything to my regular Google results? No. No discernable change.

In fact, if I just do "images" and pull that chart up, I spike, and then crater.

Fascinating. Now. One of two things is happening.

1. Either Google Images doesn't like the frame break and has something built in which makes me show up lower in the results because of it. Or…

2. Google Images needs the frame to show up as Google Images in my Analytics.

I don't know which. If I had a huge amount of Google Images traffic, I would be able to see if my Google hits went up an equivalent amount, so maybe Images traffic was being counted as regular traffic. Just a theory.

Or, if I had Analytics installed on my Version 3 archival part of the site, which has no Frame Breaker, I could see if it suffered, yes or no.

I've heard rumors that breaking the frame caused Google Images to not speak to you anymore, but never really saw that substantiated. And trying to Google terms like "frame breaker google images penalty" and the like didn't really get me anywhere.

Either way, I'm taking off the Frame Breaker. I'll see if it changes anything and if so, how quickly. And hopefully, I'll get Google Images out here to reindex my site with the SEO'd bits I do have.

If anybody has their own experience, I'd love to hear it.

Written by Widge in: Fun With SEO | Tags: , , , , , ,
Oct
17
2007
--

Never Underestimate the Power of Stumbleupon

Have seen a few posts here and there about getting traffic via Stumbleupon and thought, "Oh, yeah, Stumbleupon. I remember that." I joined about two years ago and have given the thumbs up to over 400 bits.

So I went and added Daily Kicksplode to my thumbs up bits, stating up front that it was one of mine.

Boom. 1200 visitors arrived. Nice.

(more…)

Written by Widge in: Fun With SEO | Tags:
Sep
30
2007
--

Robots.txt is Pretty Damn Important, Yes

Just found this.

Technically, it's correct: you don't need a robots.txt for good SEO.

However, it's not that simple. Part of the problems I was having on my sites was that the search engine bots weren't just crawling my site, they were freaking pounding it into a fine powder. Oh sure, if you've got a big enough server, you can afford to let them run all over you–but I'm doing this crap on a budget.

If you're on a budget hosting service, or to put it another way, if you're using the cheapest hosting you feel you can get away with–you have to make sure you're not throwing away bandwidth or CPU cycles.

Look at your access logs. Are you getting hammered every couple of seconds by Googlebot? Or the Yahoo bot? Or any bot for that matter?

If you have a robots.txt, are the bots reading it and heeding it?

It's one thing if you've got flat HTML pages for your site, but even with wp-cache running, WordPress can bog down if a bot is allowed to run rampant. And if your site is slow or can't be crawled properly because the bots have bogged it down, then yeah, that can affect your SEO.

Now you know. And knowing is half the battle.

Written by Widge in: Fun With SEO | Tags: , , , , ,
Aug
29
2007
1

WordPress Duplicate Content: You Control the Horizontal, You Control the Vertical

Okay, so, all you hear is WordPress = easy to have duplicate content. And it is. It's quite true. In fact, that may have bitten me on the arse on Needcoffee a few months back.

An individual post can appear:

  • As itself
  • In a category (as many as it has categories assigned to it)
  • In a tag page (as many as it has tags assigned to it)

So if I've got a review of a TV DVD set, it could appear in the categories: TV, DVD and Reviews. Plus if I've got five tags on it, well…you see the concern. Without even meaning to, I've got the same article showing up nine places.

What to do? Well, initially I went into my robots.txt file and just told the search engines to stay the hell away from everything but my individual posts:

Disallow: /tag/
Disallow: /category/

Then the pendulum swings back the other way–are all of my pages close enough to my front page that they get nabbed by Google and are considered important? Basically, if nobody externally is linking to a post, and it's too many hops/links away from the front page, it can be abandoned and left for dead. Or something equally dramatic.

So the happy medium is to have a post, split it up with the MORE tag, and then allow categories and individual posts in the robots.txt file, but then only have one category per post.

Ah, so: problem for me. I can have a post, as stated above, that's for a TV DVD set. Now I can see saying that DVD can be chucked because primarily it's a TV item, the media just happens to be DVD. Fine. But, well, it is a review. And what if somebody wants to browse all our reviews? Do I really want them to have to use the tag?

Then it struck me like a slice of provolone from the blue: "What a dumbass, just exclude the individual categories that you know are always going to be tied to something else." For example, we will always have Reviews AND TV. Or Reviews AND Movies. Just exclude that category. So I re-allowed /category/ and instead just did this:

Disallow: /category/reviews/
Disallow: /category/press/

Done deal. Now, granted, I have some posts with multiple categories that need to be cleaned up, but that can be done easily enough.

I wish there were a lot more SEO tools built in with WordPress, honestly. And maybe this exists among the fifty gajillion plugins and I haven't seen it, but a way of looking at all my posts and being able to check/uncheck categories en masse would be nice. Or even a plugin that went out, looked at how you've got your posts, robots.txt and such setup, and graded you for duplicate content. You know, you are at a 56% chance of being SOL because you've got too little content too many places. Something like that.

Anyway, What Have We Learned?

1. There are no absolute, hard and fast rules to SEO. And even that, being a hard and fast rule, is subject to scrutiny. Sure, you need to do stuff like use decent titles, decent URLs, and have your server, you know, actually online. There's some no brainers, but just because you find a post that says Your Robots.TXT Must Look Like This or You Are Doomed, well, have your grain of salt handy. Make sure what you're doing works for your individual site, because as I find, anyway, most SEO posts are for sites that aren't, shall we say, dealing in the trade of pop culture.

2. You can control a shitload of stuff about your site. I'm amazed at how many people actually don't have a robots.txt file. Or an .htaccess file (that they know of). I keep forgetting just how much power I have to shape what goes on on the site. It's a good idea to take five minutes a week and step back from the grind of posting and just go, "Right. Do I have my hatches battened down?"

3. Google Webmaster Tools are your friend. The robots.txt analyzer they provide has already saved me from fifteen really stupid things I could have done to cut my site off from the outside world. I highly recommend you do not make any changes to your robots.txt without running it through their first. And don't just check the Googlebot. Check the image-bot and check the media-bot (if you're running AdSense).

Written by Widge in: Fun With SEO |
Aug
15
2007
1

Exabot-Thumbnails?

Received a 500 error earlier this morning trying to post on Needcoffee. Checking the access logs it appears I have a new friend: Exabot-Thumbnails. Here's a sample line:

193.47.80.77 – - [14/Aug/2007:01:25:22 -0700] "GET /updates/tag/androids HTTP/1.0" 301 242 "-" "Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.5 (like Gecko) (Exabot-Thumbnails)"

Apparently there's been some other Exabots in the past, but nothing that's specifically "Exabot-Thumbnails." There's no info on a site to check for who the hell owns this bot, nor, from what I can tell, is it nabbing my robots.txt file. So…evil bot = .htaccess smackdown by IP address.

If anybody knows anything more about this, let me know.

Written by Widge in: Fun With SEO |
Aug
01
2007
--

I'm in Supplemental Hell, I Just Don't Know It Now: All Better

Well, this is frustrating.

For those who don't know (and probably don't care, if you're not a webmaster), there is a secondary set of search results you can get from Google. It's called Supplemental Results. It might as well be called "The Results That Aren't As Good As The Real Results." Nobody but nobody wants to be in them.

A couple of months ago, I noticed that a goodly number of Needcoffee's entries had wound up in the Supplemental Results. At first, it appeared that this was because we had a lot of duplicate content: tag pages, category pages, date pages–all with the same posts. All right, fair enough–I setup a robots.txt that kept the Googlebot from indexing pages that I didn't want, and kept single entries as indexable.

However, stuff continues to slide into Supplemental Results. Right now I was toying with internal links to try and get things under control, but basically Google has effectively blinded me to how well I'm doing. The name of the article should have been more properly called "Supplemental Goes Stealth."

This doesn't fix anything. In fact, it makes my job as a webmaster even more difficult.

It would be one thing if there was a webmaster tool that said, "Hey, Widge, here's what's wrong with your page and why it slid into Supplemental Hell." Then I would go and fix it. However, now I not only don't know why this is happening I can't even see it happening any longer. So the problem has just gotten a lot worse. Google's solution to the problem is simply to make it impossible to see the problem. But the problem hasn't gone away.

This, frankly, sucks. And this is me, Google enthusiast and defender, talking here. Why is Google doing this? I run AdSense on Needcoffee. Why would they make it harder for people to find pages on my site and thus harder to get at the ad revenue that I could potentially bring in? And this is not just my site–AdSense is all over the place, and this affects everybody's sites. It would be in Google's best interests, I would think, to provide us with the tools so we can make our sites work better with their search engine, so everybody wins. Again, I'm not one of this whiny assholes who thinks Google owes me this–they owe me jack crap. It's just hard to understand why they would respond to a problem by, instead of using their vaunted resources to throw at it, to make it look like it's gone away and hope nobody bitches.

Somebody help me understand how this is a good idea.

Written by Widge in: Fun With SEO | Tags: , , , ,
Widge and his truest friend

This is me.

No, really.

I am a writer, poet, spoken word performer, actor, singer, improviser, content creation and idea machine, freelance iconoclast, and the internet's janitor that dispenses pop culture wisdom to the protagonist of your choice. I have seen too many movies, read too many comic books, and when the zombies finally come, I'm the one you want to call. I sure as hell won't answer the phone, but it's the thought that counts. I advise people on the net, websites and technology, because I know these things instead of having a life or sleeping.

If you like something I've done, donate to the Widge Wants to Kill His Day Job Fund. Or if you'd like to hire me for a job, my rates are terribly reasonable. We thank you.

Powered by WordPress. Theme: TheBuckmaker's Aerodrome.