Clean up your Apache access logs and leave whats important

Are you the kind of person that likes to monitor your Apache access logs? If you're not, stop reading now and do something else. If you are, let's get busy.

During any sort of web application development or testing we tend to load and reload our pages many times as we test our tweaks. Or sometimes we just want to make sure our site is still up so we just "check in" to make sure everything is kosher.

Problem is that this creates a bunch of unnecessary entries in our Apache access.log that we really don't care about. We need a way to exclude our own IP address from being logged in Apache's access.log.

So how do we get Apache to NOT log certain things according to our preferences? Enter "SetEnvIf". Using the SetEnvIf directive we can tell Apache what we don't want to log.

What is filterable?

There are a number of things we can check about any request coming in through Apache. Here's a list of the most useful stuff:

User-Agent

The type of client application making the request.

Remote_Addr

The IP address making the request.

Request_Uri

The url of the page being requested.

Referer

The place the user was coming from to get to your site.

Where do your log definitions live?

Somewhere in your Apache setup you have a line defining your log similar to this:

CustomLog /path/to/site/logs/access.log combined

In my case, I have something similar to the line above in my virtual host configuration for this site:


  # Admin email, Server Name (domain name) and any aliases
  ServerAdmin dan@code621.com
  ServerName  code621.com
  ServerAlias www.code621.com

  # Index file and Document Root (where the public files are located)
  DirectoryIndex index.php
  DocumentRoot /var/www/sites/code621/prod/app/webroot

  # Custom log file locations
  LogLevel warn
  ErrorLog  /var/www/sites/code621/prod/logs/error.log
  CustomLog /var/www/sites/code621/prod/logs/access.log combined

Tell Apache you have some conditions

First we're going to tell Apache that we have some conditions to check when adding entries to access.log. So using my virtual host example above, I'm going to change my log definition that looks like this:


  CustomLog /var/www/sites/code621/prod/logs/access.log combined

To this:


  CustomLog /var/www/sites/code621/prod/logs/access.log combined env=!exclude_from_log

Notice the "env=!exclude_from_log" at the end? This will tell Apache to only include entries in access.log that don't match a certain condition.

Create some conditions

Now we can start creating some conditions for Apache to test. The typical setup for a condition looks like this:

SetEnvIf   -   attribute   -   regex   -   env-variable

To create a rule that would NOT log our own IP address while developing or testing we could add a rule like: (Assuming our IP address is 11.111.111.111)

SetEnvIf Remote_Addr "11\.111\.111\.111" exclude_from_log

And adding that rule to the complete virtual host example:


  # Admin email, Server Name (domain name) and any aliases
  ServerAdmin dan@code621.com
  ServerName  code621.com
  ServerAlias www.code621.com

  # Index file and Document Root (where the public files are located)
  DirectoryIndex index.php
  DocumentRoot /var/www/sites/code621/prod/app/webroot
  
  # Don't log these conditions
  SetEnvIf Remote_Addr "11\.111\.111\.111" exclude_from_log

  # Custom log file locations
  LogLevel warn
  ErrorLog  /var/www/sites/code621/prod/logs/error.log
  CustomLog /var/www/sites/code621/prod/logs/access.log combined env=!exclude_from_log

You can also filter out things like Googlebot stopping by to index your site:

SetEnvIFNoCase User-Agent "Googlebot" exclude_from_log

You can also choose to have images, css and js requests not show up in the log:

SetEnvIfNoCase Request_URI "\.(gif)|(jpg)|(jpeg)|(png)|(ico)|(css)|(js)$" exclude_from_log

And finally, don't forget to restart (or reload) Apache after you make any changes to your logging rules.

Comments (0)

Leave a comment

Name (Required)
Email (Required)
Url (Optional)
Comment (Required)