Redirecting .html to .htm via .htaccess
Posted on 12.14.07 by Widge @ 1:12 am

Okay, so here's some fun. Make yourself comfortable.

I went out to find an image for Needcoffee from the archives we've already got on the site so I wouldn't have to grab it from elsewhere (remember, kids: recycle!) and realized that some of our archival pages weren't loading properly on the new server.

Brief explanation: on Version 3 of the site (pre-Wordpress) we used frames. We're not proud of it, but it made sense for 1998. The content portion of the site was an .htm file. The frame that went around it and contained the masthead and the menu/sidebar was an .html file. So you could have, for example, a DVD review of Drop Dead Fred and the review itself was ddfred.htm while the frame that held it was ddfred.html.

Again, it was the best we could muster at the time.

Anyway, while checking out the new server for this old image I wanted, I discovered that the .html versions of pages were coughing up a lung. "An error occurred while processing this directive." Lovely. But the .htm versions were fine. What to do?

Well, tear open a .html file and saw that AXS, the web tracking bit we used to use on the old site (and still use in the archives) wasn't working because it was pointing to an old CGI directory and the CGI directory wasn't kosher any longer.

So the first thing I tried to do was get the old CGI directory to work, but I couldn't seem to figure out how to do that. And I wanted to get the damn thinking working. So I decided, hey! Screw it, I'll just to a rewrite via the .htaccess and when people go to the .html version, I'll just send them to the direct review instead. That's actually better from an SEO perspective because back when we setup the whole framed system SEO wasn't even a twinkle in anybody's pants.

But somehow I botched it and the .htaccess blew up the site for a few minutes. I had a Redirect that never redirected anywhere, and it didn't matter that it was in my /html/dvd/ directory, the server read down through .htaccess, couldn't go any further and barfed.

Why did this happen, children? Because Uncle Widge fucked himself over. He got in a hurry and forgot the cardinal rule of screwing around with your site:

The Cardinal Rule of Screwing Around With Your Site:

Don't be a dick. Keep a backup and a fire extinguisher handy at all times.

That's right. When you forget the Cardinal Rule, Yahweh himself will laugh at you and your site will explode.

So. I finally got that corrected after a few minutes outage, and I was able to find some info on how to do this in .htaccess.

Not that I think anybody out there has the same setup as the old Needcoffee.com site, but still. You can futz with this to match your own site.

This is a variation on what I found at SEOBook, which was the best and closest to what I was trying to do.

Disclaimer:

When it comes to .htaccess rewriting, I barely understand what I'm doing. I admit that up front, so if you want tips on this, I may or may not have any clue as to what you're asking.

So.

I basically have an .htaccess in the subdirectory where I want to make this change.

I added this to the top:

Options +FollowSymlinks
RewriteEngine on
RewriteRule ^(.+)\\.html http://www.needcoffee.com/html/$1.htm [r=301,nc]

Here's what I can tell you. The Options and RewriteEngine statement make the thing work to begin with. The rule itself says for every .html hit, rewrite and resend them instead to .htm. Also, it gives a 301 redirect, which is good because search engines need to be able to find the content that's been "moved," or in this case, simply bypassed.

The one thing I don't understand is, the way I read this code, it looks like it should only work for the /html/ directory, but it instead works in all subdirectories. So that I'm clueless about. Just for the moment it works.

There? Aren't you sorry you asked now?

Update: Shade at That's My Stapler tries to help my inept ass out. I think I almost understand what he's saying…I can do coding, me.

Filed under: Fun With SEO
Comments: None


Frame Breaker Breaks Google Images?
Posted on 11.08.07 by Widge @ 10:10 pm

This is interesting. So Headspace2, the badass plugin that I'm using over on Needcoffee currently, has a Frame Breaker built into it.

WTF is a Frame Breaker?

Well, you know how when you search in Google Images, you get presented the site below a Google frame up top? The Frame Breaker breaks you out of that frame, natch, so you get served the site without the Google flavoring up top.

I always did wonder about that, so I flicked it on. That was around the 22nd or 23rd of October.

I decided to come back and check to see how it had affected my traffic on Google Images.

Now, let me state this up front. My understanding is that Google Images doesn't update very often. And I also understand that for the majority of my images, I haven't SEO'd them up worth a good goddamn because at the time I was putting them up, I had no idea why I should. (Of course, people still seem to find me and force me to do shit like this.)

So I don't have a great deal of traffic coming in anyway. But let's look.

Google Images spiked the day I turned it on, then went to a reasonable trickle.

As for Google Images.ca? I went from a trickle…to nothing.

Google Images.uk? From an erratic mess to…nothing.

And pretty much on down the line.

Did it do anything to my regular Google results? No. No discernable change.

In fact, if I just do "images" and pull that chart up, I spike, and then crater.

Fascinating. Now. One of two things is happening.

1. Either Google Images doesn't like the frame break and has something built in which makes me show up lower in the results because of it. Or…

2. Google Images needs the frame to show up as Google Images in my Analytics.

I don't know which. If I had a huge amount of Google Images traffic, I would be able to see if my Google hits went up an equivalent amount, so maybe Images traffic was being counted as regular traffic. Just a theory.

Or, if I had Analytics installed on my Version 3 archival part of the site, which has no Frame Breaker, I could see if it suffered, yes or no.

I've heard rumors that breaking the frame caused Google Images to not speak to you anymore, but never really saw that substantiated. And trying to Google terms like "frame breaker google images penalty" and the like didn't really get me anywhere.

Either way, I'm taking off the Frame Breaker. I'll see if it changes anything and if so, how quickly. And hopefully, I'll get Google Images out here to reindex my site with the SEO'd bits I do have.

If anybody has their own experience, I'd love to hear it.

Filed under: Fun With SEO
Comments: 1 Comment


Robots.txt is Pretty Damn Important, Yes
Posted on 09.30.07 by Widge @ 5:52 am

Just found this.

Technically, it's correct: you don't need a robots.txt for good SEO.

However, it's not that simple. Part of the problems I was having on my sites was that the search engine bots weren't just crawling my site, they were freaking pounding it into a fine powder. Oh sure, if you've got a big enough server, you can afford to let them run all over you–but I'm doing this crap on a budget.

If you're on a budget hosting service, or to put it another way, if you're using the cheapest hosting you feel you can get away with–you have to make sure you're not throwing away bandwidth or CPU cycles.

Look at your access logs. Are you getting hammered every couple of seconds by Googlebot? Or the Yahoo bot? Or any bot for that matter?

If you have a robots.txt, are the bots reading it and heeding it?

It's one thing if you've got flat HTML pages for your site, but even with wp-cache running, Wordpress can bog down if a bot is allowed to run rampant. And if your site is slow or can't be crawled properly because the bots have bogged it down, then yeah, that can affect your SEO.

Now you know. And knowing is half the battle.

Filed under: Fun With SEO
Comments: None


I'm in Supplemental Hell, I Just Don't Know It Now: All Better
Posted on 08.01.07 by Widge @ 2:12 am

Well, this is frustrating.

For those who don't know (and probably don't care, if you're not a webmaster), there is a secondary set of search results you can get from Google. It's called Supplemental Results. It might as well be called "The Results That Aren't As Good As The Real Results." Nobody but nobody wants to be in them.

A couple of months ago, I noticed that a goodly number of Needcoffee's entries had wound up in the Supplemental Results. At first, it appeared that this was because we had a lot of duplicate content: tag pages, category pages, date pages–all with the same posts. All right, fair enough–I setup a robots.txt that kept the Googlebot from indexing pages that I didn't want, and kept single entries as indexable.

However, stuff continues to slide into Supplemental Results. Right now I was toying with internal links to try and get things under control, but basically Google has effectively blinded me to how well I'm doing. The name of the article should have been more properly called "Supplemental Goes Stealth."

This doesn't fix anything. In fact, it makes my job as a webmaster even more difficult.

It would be one thing if there was a webmaster tool that said, "Hey, Widge, here's what's wrong with your page and why it slid into Supplemental Hell." Then I would go and fix it. However, now I not only don't know why this is happening I can't even see it happening any longer. So the problem has just gotten a lot worse. Google's solution to the problem is simply to make it impossible to see the problem. But the problem hasn't gone away.

This, frankly, sucks. And this is me, Google enthusiast and defender, talking here. Why is Google doing this? I run AdSense on Needcoffee. Why would they make it harder for people to find pages on my site and thus harder to get at the ad revenue that I could potentially bring in? And this is not just my site–AdSense is all over the place, and this affects everybody's sites. It would be in Google's best interests, I would think, to provide us with the tools so we can make our sites work better with their search engine, so everybody wins. Again, I'm not one of this whiny assholes who thinks Google owes me this–they owe me jack crap. It's just hard to understand why they would respond to a problem by, instead of using their vaunted resources to throw at it, to make it look like it's gone away and hope nobody bitches.

Somebody help me understand how this is a good idea.

Filed under: Fun With SEO
Comments: None


Nofollow: Now They Freaking Tell Me
Posted on 05.08.07 by Widge @ 2:56 pm

So there I am, trolling through my blogs, behind as always, when I come across this line from SEO Black Hat:

If you have verified the link and you are putting it on your site, do not put a nofollow on it. It's bad form.

Well, this threw me for a loop. And that's not a coding joke.

Back before I even started seriously screwing around with SEO, I ran across a few blog posts (and I won't even begin to fathom where they were) that seemed to know what they were talking about that explained the problem with bleeding PageRank. Basically, if you link out to a page, you give them a skoche of your PageRank. At the time, PageRank seemed like something you wanted to hang onto. And in my case, since I was trying to build a site properly, hang onto for dear life.

The way to get out of this was to stick a rel="nofollow" on your outbound links. This would tell the search engines that mojo leakage was not in order, thank you very much.

Okay, fine. Makes sense.

But here's where we learn that SEO is a lot like medical science. I shall explain.

One of my problems with doctors is that you get a lot of facts with your diagnosis but also a ton of opinion and interpretation. And also, they can't ever seem to agree on a lot of things. That's why red meat and caffeine are like Schrodinger's Diet–they exist in a "good for you/bad for you" state simultaneously, because nobody can make up their mind whether or not one or the other or both will kill you.

Same thing with SEO. Because it's a bunch of folks peering in through frosted glass at the inner workings of the search engines, they're guessing. And at the time, it seems like a lot of people were making honest guesses about nofollow and what it meant and what you should do about it.

But anyway, so this post at Black Hat really took the wind out of me, honestly. I've always considered myself a "black" hat SEO guy just because I only wear black, so I'm kind of one of them by default. But still–"bad form"? Fuck, the last thing I want to be accused of is bad form. I mean, if I'm going to be accused of being an asshole, I'd prefer it to be for something I meant to do.

So I started combing around. Here's this from Scobleizer. Which led me to this from Search Engine Journal:

Linking to someone with a NoFollow attribute is a sign of not trusting them. It's like reaching to shake someone's hand, but stopping to put on a pair of latex gloves.

Now, excluding for a moment that I might wear latex gloves when shaking somebody's hand because in my old age I'm finding I'm about three steps away from becoming Monk, still…FUCK. Now they tell me.

And upon reflection, since I am, after all, a Machiavellian bastard who wants to come out on top of everything (but at least I tell you this up front), I decided this makes sense. First of all, I don't want to look like an asshole unless I am an asshole. And I have plenty of other opportunities to be assholish that actually make sense. So. Second, I can't honestly tell you what PageRank does for me when it comes to Search Engine standings. I don't have the best PageRank in the world, and yet I seem to make out just fine. So. Fuck it.

I just went and did an uber-find-and-replace and all the nofollow shit should be gone on Needcoffee. Effective immediately. If you find anything that is, let me know. I'll fix it on here soon enough…I've got other stuff broken on here since the move.

Filed under: Fun With SEO
Comments: None


SEO Tip: Quick and Dirty Access Log Fun: What Do I Need to Optimize First?
Posted on 04.27.07 by Widge @ 12:02 am

Well, you haven't seen me a lot around here because I've been up to my elbows in code over on Needcoffee. I've been trying to optimize the site while fighting with WP-Cache, which I can't live without due to my traffic, but which also kills me if I try to update the site while it's turned on. I'm still trying to figure out that silly shit.

Anyway, now that I've finally gotten a bunch of superfluous ne'er-do-wells gone from my access.log file, I can finally look at it and see what's happening moment by moment to try and address the problem. One thing I wanted to see, though, is what is really taxing the memory of my server space.

Trouble is, the access.log I get from Dreamhost (which I can only assume is the same sort you get from where you are) looks like this:

x.x.x.x - - [26/Apr/2007:00:36:50 -0700] "GET /wp-content/plugins/podpress/podpress_js.php HTTP/1.1" 200 2311 "http://www.needcoffee.com/2006/03/08/power-rangers-dino-thunder-vol-3-dvd-review/" "Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3"

(more…)

Filed under: Fun With SEO
Comments: None


John Robinson is a writer of prose, poetry and comics who also writes under the pseudonym of Widgett Walls.

Widgett Walls is the director of Needcoffee.com who also writes under the pseudonym of John Robinson.

Don't ask.


This is my latest book. Short stories written especially for you, or at least someone who reminded me a lot of you at the time.

Read it for free here. Or if you like paper, buy it here.

Then tell all your friends about it. Or all your enemies. I'm not particular either way.


Want a translation? Try these:

They're not perfect, but they'll do in a pinch.







Syndication
Feedburner

Amigos
Sarah Brown
Catalyst
Dindrane
Doc Ezra
Tee Quillin
ScottC
Tibby's Bowl
JM Tuffley

Sites Which Distract Me From Writing
Boing Boing
Cringely
Defamer
Warren Ellis
Engadget
Fortean Times
Long Tail
Porphyre
Reason
Wired

Topics
General BS
Insomnia
Travel
Writing Fodder

Active Projects...
Dark Blue Monstropolis
Magnificent Desolation
Something Else
The Sunday Before You


Recent Entries
  • And Now That I've Upgraded to 2.6...
  • Don't Mind Me.
  • Amazon MAB Replacement?
  • My New Mascot
  • A Nice Coda to the Trip
  • The New Yorker Hotel Business Center
  • Blast From the Past
  • Crossposted From My StumbleUpon Blog
  • Update at Last
  • George Clooney Makes Small Films Profitable. Yes.
  • On the Other Side of the Flu
  • Piano and Trumpet For the Win
  • Gun, With Occasional Weightlessness
  • Tor Nørretranders on Permanent Reincarnation
  • Can I Get a Hell Yeah?

  • Wordpress Archives
    July 2008
    May 2008
    April 2008
    March 2008
    February 2008
    January 2008
    December 2007
    November 2007
    October 2007
    September 2007
    August 2007
    July 2007
    June 2007
    May 2007
    April 2007
    March 2007
    February 2007
    January 2007
    December 2006
    November 2006
    October 2006
    September 2006
    August 2006
    July 2006
    June 2006
    May 2006
    April 2006
    March 2006
    February 2006
    January 2006
    December 2005
    November 2005
    October 2005
    September 2005
    August 2005
    July 2005
    June 2005
    May 2005
    April 2005
    March 2005

    Credits and Copyright
    Proudly powered by WordPress. All content © 1997-present by John Robinson.
    Theme by Theron Parlin, but we've mangled it beyond all reason. So don't blame him.