Clean up your Apache access logs and leave whats important
Are you the kind of person that likes to monitor your Apache access logs? If you're not, stop reading now and do something else. If you are, let's get busy.
During any sort of web application development or testing we tend to load and reload our pages many times as we test our tweaks. Or sometimes we just want to make sure our site is still up so we just "check in" to make sure everything is kosher.
Problem is that this creates a bunch of unnecessary entries in our Apache access.log that we really don't care about. We need a way to exclude our own IP address from being logged in Apache's access.log.
So how do we get Apache to NOT log certain things according to our preferences? Enter "SetEnvIf". Using the SetEnvIf directive we can tell Apache what we don't want to log.
What is filterable?
There are a number of things we can check about any request coming in through Apache. Here's a list of the most useful stuff:
User-Agent
The type of client application making the request.
Remote_Addr
The IP address making the request.
Request_Uri
The url of the page being requested.
Referer
The place the user was coming from to get to your site.
Where do your log definitions live?
Somewhere in your Apache setup you have a line defining your log similar to this:
CustomLog /path/to/site/logs/access.log combined
In my case, I have something similar to the line above in my virtual host configuration for this site:
# Admin email, Server Name (domain name) and any aliases ServerAdmin dan@code621.com ServerName code621.com ServerAlias www.code621.com # Index file and Document Root (where the public files are located) DirectoryIndex index.php DocumentRoot /var/www/sites/code621/prod/app/webroot # Custom log file locations LogLevel warn ErrorLog /var/www/sites/code621/prod/logs/error.log CustomLog /var/www/sites/code621/prod/logs/access.log combined
Tell Apache you have some conditions
First we're going to tell Apache that we have some conditions to check when adding entries to access.log. So using my virtual host example above, I'm going to change my log definition that looks like this:
CustomLog /var/www/sites/code621/prod/logs/access.log combined
To this:
CustomLog /var/www/sites/code621/prod/logs/access.log combined env=!exclude_from_log
Notice the "env=!exclude_from_log" at the end? This will tell Apache to only include entries in access.log that don't match a certain condition.
Create some conditions
Now we can start creating some conditions for Apache to test. The typical setup for a condition looks like this:
SetEnvIf - attribute - regex - env-variable
To create a rule that would NOT log our own IP address while developing or testing we could add a rule like: (Assuming our IP address is 11.111.111.111)
SetEnvIf Remote_Addr "11\.111\.111\.111" exclude_from_log
And adding that rule to the complete virtual host example:
# Admin email, Server Name (domain name) and any aliases ServerAdmin dan@code621.com ServerName code621.com ServerAlias www.code621.com # Index file and Document Root (where the public files are located) DirectoryIndex index.php DocumentRoot /var/www/sites/code621/prod/app/webroot # Don't log these conditions SetEnvIf Remote_Addr "11\.111\.111\.111" exclude_from_log # Custom log file locations LogLevel warn ErrorLog /var/www/sites/code621/prod/logs/error.log CustomLog /var/www/sites/code621/prod/logs/access.log combined env=!exclude_from_log
You can also filter out things like Googlebot stopping by to index your site:
SetEnvIFNoCase User-Agent "Googlebot" exclude_from_log
You can also choose to have images, css and js requests not show up in the log:
SetEnvIfNoCase Request_URI "\.(gif)|(jpg)|(jpeg)|(png)|(ico)|(css)|(js)$" exclude_from_log
And finally, don't forget to restart (or reload) Apache after you make any changes to your logging rules.
Comments (0)
Leave a comment