SEO Tip: Quick and Dirty Access Log Fun: What Do I Need to Optimize First?
Well, you haven't seen me a lot around here because I've been up to my elbows in code over on Needcoffee. I've been trying to optimize the site while fighting with WP-Cache, which I can't live without due to my traffic, but which also kills me if I try to update the site while it's turned on. I'm still trying to figure out that silly shit.
Anyway, now that I've finally gotten a bunch of superfluous ne'er-do-wells gone from my access.log file, I can finally look at it and see what's happening moment by moment to try and address the problem. One thing I wanted to see, though, is what is really taxing the memory of my server space.
Trouble is, the access.log I get from Dreamhost (which I can only assume is the same sort you get from where you are) looks like this:
x.x.x.x – - [26/Apr/2007:00:36:50 -0700] "GET /wp-content/plugins/podpress/podpress_js.php HTTP/1.1" 200 2311 "http://www.needcoffee.com/2006/03/08/power-rangers-dino-thunder-vol-3-dvd-review/" "Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3"
Now, barring for a moment that someone is actually viewing a Power Rangers review and we must find them and stop them from breeding, imagine 10MB of that. That's how much I've got for a full day's access log, and that's after I've been working for a few days to optimize my robots.txt file.
Now if we want quick results to cut down on the really bad big files that we've got out there, what can we do? Especially when, like Needcoffee, we're looking at a site that's been around for ten years, and has scads and scads of pre-Wordpress material that hasn't been converted yet. That log is a mess.
Well, the obvious thing would be to sort the log file by size of the file being requested, and I've seen some sites promising perl scripts or whatever, but I thought there had to be an easier way.
And here it is.
1. Take your access.log and open it in a text editor. Now, granted, if you're looking to do a 10MB access log, Wordpad will cough up a lung so grab something like Editpad or the like, or just use a subset of the log.
2. Do a find and replace. You want to find a space, i.e. " " and replace it with a comma "," Since we don't care about any data that would get screwed up by doing this, go for it.
3. Save the file with the suffix of .csv
4. Open the file in Excel (or equivalent) as a text .csv file
5. This should put the info into a spreadsheet where you should have a column for size. On my version, it's column H. Sort by H and take a look.
In my case, once I get past the podcasts and such that are supposed to be large I find…wow, holy crap: there's a JPG on here that's 73KB that flat out doesn't need to be.
Also, prototype.js, which Wordpress uses for the admin panels, is about that size as well. I wish somebody would create a stripped down, no FX, just want to get the shit done Wordpress admin theme, for those of us who…well, just want to get the shit done.
Anyway, there you go. Enjoy.
If this is helpful, I may post more stuff like this as I find it.
No Comments »
RSS feed for comments on this post. TrackBack URL
