CERN HTTPD Logfile Analysis
for variable in_cern_logfile expression do
statements
endfor;
Analysing the CERN HTTPD logfile is made complicated by the poor formatting
of the logfile. To simplify matters, I have written an extension to the
for-loop that iterates over each line of a logfile, decoding each line.
(This extension is autoloaded, so you do not have to do anything special
to make it available.)
For each line of the logfile, this loop binds the variable
to the components of the line. For each line there are 7 components
which are :-
- Domain name of client (log_entry_domain)
- Remote identity (log_entry_remote_ident)
- User name (log_entry_user)
- Time of access (log_entry_utime)
- Request (log_entry_request_data)
- Status code (log_entry_status_code)
- Proxy bytes (log_entry_proxy_bytes)
I'm not sure what all of these fields are!
There are access procedures that correspond to each of these fields
and, for the more complex fields, there are procedures that pull out
the subcomponents. However, you can simply index the variable
to get any field. So if you wanted to get the proxy-bytes then you
might write
lvars i;
for i in_cern_logfile 'aibp.log' do
nprintf( 'Proxy bytes = %p', [% i( 7 ) %] )
endfor;
Interface
- app_cern_logfile( logfile, procedure)
- A procedural version of the loop syntax. It could have been
written as
define app_logfile( f, p ); lvars f, p;
lvars i;
for i in_cern_logfile f do
p( i )
endfor
enddefine;
- log_entry_domain( variable ) -> domain_name
- Given a log-entry, returns the domain name of the client
machine. This is the same as the first field of the log-entry.
- log_entry_remote_ident( variable ) -> string
- Given a log-entry, returns the remote identity as a string. This
is the same as the second field of the log-entry.
When this is not applicable, ``-'' is returned.
- log_entry_user( variable ) -> string
- Given a log-entry, returns the user-name as a string. This is the
same as the third field of the log-entry. When this is not
applicable, ``-'' is returned.
- log_entry_time_data( variable ) -> seconds, minutes, hour, date, month, year
- Given a log-entry, parses the 4th field to return the second,
minute, hour, day-of-month, number-of-month, and year of the access.
All of these fields are numbers.
- log_entry_utime( variable ) -> string
- Given a log-entry, returns the time in universal-time format
(compatible with Common Lisp).
- log_entry_request_data( variable ) -> method, url, rest, protocol
- Given a log-entry, returns the method (GET, POST), the
URL, any name/value pairs, and the protocol used by the request.
- log_entry_url( variable ) -> string
- Given a log-entry, returns the URL. This is provided as a slightly
more efficient version of the above, on the basis that it will
be common to want to do analyses that only look at the URL.
- log_entry_status_code( variable ) -> string
- Given a log-entry, returns the entry status as a number or a
string. It always attempts to convert to a number if possible.
- log_entry_proxy_bytes( variable ) -> string
- Given a log-entry, returns the proxy-bytes as a number or a
string. It always attempts to convert to a number if possible.
Examples
How Many Accesses to a Particular Directory?
This example arises from the desire to track how many accesses
to the NCT material there were. This material lives in
/services/nct.
lvars num_accesses = 0;
lvars i;
for i in_cern_logfile 'cd /usr/WWW/WWW/httpd-log' do
if isstartstring( '/services/nct', log_entry_domain( i ) ) then
num_accesses + 1 -> num_accesses;
endif;
endfor;