SEO Tip: Quick and Dirty Access Log Fun: What Do I Need to Optimize First?
Posted on 04.27.07 by Widge @ 12:02 am

Well, you haven't seen me a lot around here because I've been up to my elbows in code over on Needcoffee. I've been trying to optimize the site while fighting with WP-Cache, which I can't live without due to my traffic, but which also kills me if I try to update the site while it's turned on. I'm still trying to figure out that silly shit.

Anyway, now that I've finally gotten a bunch of superfluous ne'er-do-wells gone from my access.log file, I can finally look at it and see what's happening moment by moment to try and address the problem. One thing I wanted to see, though, is what is really taxing the memory of my server space.

Trouble is, the access.log I get from Dreamhost (which I can only assume is the same sort you get from where you are) looks like this:

x.x.x.x - - [26/Apr/2007:00:36:50 -0700] "GET /wp-content/plugins/podpress/podpress_js.php HTTP/1.1" 200 2311 "http://www.needcoffee.com/2006/03/08/power-rangers-dino-thunder-vol-3-dvd-review/" "Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3"


Now, barring for a moment that someone is actually viewing a Power Rangers review and we must find them and stop them from breeding, imagine 10MB of that. That's how much I've got for a full day's access log, and that's after I've been working for a few days to optimize my robots.txt file.

Now if we want quick results to cut down on the really bad big files that we've got out there, what can we do? Especially when, like Needcoffee, we're looking at a site that's been around for ten years, and has scads and scads of pre-Wordpress material that hasn't been converted yet. That log is a mess.

Well, the obvious thing would be to sort the log file by size of the file being requested, and I've seen some sites promising perl scripts or whatever, but I thought there had to be an easier way.

And here it is.

1. Take your access.log and open it in a text editor. Now, granted, if you're looking to do a 10MB access log, Wordpad will cough up a lung so grab something like Editpad or the like, or just use a subset of the log.

2. Do a find and replace. You want to find a space, i.e. " " and replace it with a comma "," Since we don't care about any data that would get screwed up by doing this, go for it.

3. Save the file with the suffix of .csv

4. Open the file in Excel (or equivalent) as a text .csv file

5. This should put the info into a spreadsheet where you should have a column for size. On my version, it's column H. Sort by H and take a look.

In my case, once I get past the podcasts and such that are supposed to be large I find…wow, holy crap: there's a JPG on here that's 73KB that flat out doesn't need to be.

Also, prototype.js, which Wordpress uses for the admin panels, is about that size as well. I wish somebody would create a stripped down, no FX, just want to get the shit done Wordpress admin theme, for those of us who…well, just want to get the shit done.

Anyway, there you go. Enjoy.

If this is helpful, I may post more stuff like this as I find it.


Taggification: , , , , , , ,

Filed under: Fun With SEO
Comments:

« A Thought From Inside an Automatic Car Wash | Nofollow: Now They Freaking Tell Me » No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a comment

Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

(required)

(required)



John Robinson is a writer of prose, poetry and comics who also writes under the pseudonym of Widgett Walls.

Widgett Walls is the director of Needcoffee.com who also writes under the pseudonym of John Robinson.

Don't ask.


This is my latest book. Short stories written especially for you, or at least someone who reminded me a lot of you at the time.

Read it for free here. Or if you like paper, buy it here.

Then tell all your friends about it. Or all your enemies. I'm not particular either way.


Want a translation? Try these:

They're not perfect, but they'll do in a pinch.







Syndication
Feedburner

Amigos
Sarah Brown
Catalyst
Dindrane
Doc Ezra
Tee Quillin
ScottC
Tibby's Bowl
JM Tuffley

Sites Which Distract Me From Writing
Boing Boing
Cringely
Defamer
Warren Ellis
Engadget
Fortean Times
Long Tail
Porphyre
Reason
Wired

Topics
General BS
Insomnia
Travel
Writing Fodder

Active Projects...
Dark Blue Monstropolis
Magnificent Desolation
Something Else
The Sunday Before You


Recent Entries
  • And Now That I've Upgraded to 2.6...
  • Don't Mind Me.
  • Amazon MAB Replacement?
  • My New Mascot
  • A Nice Coda to the Trip
  • The New Yorker Hotel Business Center
  • Blast From the Past
  • Crossposted From My StumbleUpon Blog
  • Update at Last
  • George Clooney Makes Small Films Profitable. Yes.
  • On the Other Side of the Flu
  • Piano and Trumpet For the Win
  • Gun, With Occasional Weightlessness
  • Tor Nørretranders on Permanent Reincarnation
  • Can I Get a Hell Yeah?

  • Wordpress Archives
    July 2008
    May 2008
    April 2008
    March 2008
    February 2008
    January 2008
    December 2007
    November 2007
    October 2007
    September 2007
    August 2007
    July 2007
    June 2007
    May 2007
    April 2007
    March 2007
    February 2007
    January 2007
    December 2006
    November 2006
    October 2006
    September 2006
    August 2006
    July 2006
    June 2006
    May 2006
    April 2006
    March 2006
    February 2006
    January 2006
    December 2005
    November 2005
    October 2005
    September 2005
    August 2005
    July 2005
    June 2005
    May 2005
    April 2005
    March 2005

    Credits and Copyright
    Proudly powered by WordPress. All content © 1997-present by John Robinson.
    Theme by Theron Parlin, but we've mangled it beyond all reason. So don't blame him.