CERN HTTPD Logfile Analysis

for variable in_cern_logfile expression do
    statements
endfor;
Analysing the CERN HTTPD logfile is made complicated by the poor formatting of the logfile. To simplify matters, I have written an extension to the for-loop that iterates over each line of a logfile, decoding each line. (This extension is autoloaded, so you do not have to do anything special to make it available.)

For each line of the logfile, this loop binds the variable to the components of the line. For each line there are 7 components which are :-

  1. Domain name of client (log_entry_domain)
  2. Remote identity (log_entry_remote_ident)
  3. User name (log_entry_user)
  4. Time of access (log_entry_utime)
  5. Request (log_entry_request_data)
  6. Status code (log_entry_status_code)
  7. Proxy bytes (log_entry_proxy_bytes)
I'm not sure what all of these fields are!

There are access procedures that correspond to each of these fields and, for the more complex fields, there are procedures that pull out the subcomponents. However, you can simply index the variable to get any field. So if you wanted to get the proxy-bytes then you might write

lvars i;
for i in_cern_logfile 'aibp.log' do
    nprintf( 'Proxy bytes = %p', [% i( 7 ) %] )
endfor;


Interface

app_cern_logfile( logfile, procedure)
A procedural version of the loop syntax. It could have been written as
define app_logfile( f, p ); lvars f, p;
    lvars i;
    for i in_cern_logfile f do
        p( i )
    endfor
enddefine;

log_entry_domain( variable ) -> domain_name
Given a log-entry, returns the domain name of the client machine. This is the same as the first field of the log-entry.

log_entry_remote_ident( variable ) -> string
Given a log-entry, returns the remote identity as a string. This is the same as the second field of the log-entry. When this is not applicable, ``-'' is returned.

log_entry_user( variable ) -> string
Given a log-entry, returns the user-name as a string. This is the same as the third field of the log-entry. When this is not applicable, ``-'' is returned.

log_entry_time_data( variable ) -> seconds, minutes, hour, date, month, year
Given a log-entry, parses the 4th field to return the second, minute, hour, day-of-month, number-of-month, and year of the access. All of these fields are numbers.

log_entry_utime( variable ) -> string
Given a log-entry, returns the time in universal-time format (compatible with Common Lisp).

log_entry_request_data( variable ) -> method, url, rest, protocol
Given a log-entry, returns the method (GET, POST), the URL, any name/value pairs, and the protocol used by the request.

log_entry_url( variable ) -> string
Given a log-entry, returns the URL. This is provided as a slightly more efficient version of the above, on the basis that it will be common to want to do analyses that only look at the URL.

log_entry_status_code( variable ) -> string
Given a log-entry, returns the entry status as a number or a string. It always attempts to convert to a number if possible.

log_entry_proxy_bytes( variable ) -> string
Given a log-entry, returns the proxy-bytes as a number or a string. It always attempts to convert to a number if possible.


Examples

How Many Accesses to a Particular Directory?

This example arises from the desire to track how many accesses to the NCT material there were. This material lives in /services/nct.

lvars num_accesses = 0;
lvars i;
for i in_cern_logfile 'cd /usr/WWW/WWW/httpd-log' do
    if isstartstring( '/services/nct', log_entry_domain( i ) ) then
        num_accesses + 1 -> num_accesses;
    endif;
endfor;