comment on

Whatever solution you choose, keep in mind that your client may well come up with new ideas of what should be reported, once your report system is working.

They will say: "Oh, you can do this now, then I want..."

Here are some things that to my experience will be asked for by the client:

Session statistics, i.e. select all hits from cookie/ip-number x who is less than 15 minutes apart.
Different statistics for different departments, e.g. "We just relised that the ACME department should have their own stats, they have their pages in /docs/others/acme and in /cgi-bin/acmestuff/"
If the statistics are to be presented to management, presentation is very important; the reports should look good, and contain exactly the data that they want (the data that will impress them).

Try to make a design that will make these kind of things easy to put in once the client realises he/she wants it.

Here is a description of a design of an OO log analysis application that I made. This design is in many ways a lot more primitive than what you want, but it may give some input:

There are a number of objects in the application. The most important ones right now are the report object, the logfile object, the input object and the category object. There is also an output object.
The report object
The report object (Report.pm) is the central object in the application. It stores global settings and it has slots for all other objects. Objects "know" about each other because they:

are stored in slots (attributes) in the report object
each have slot for the report object
Hence obj2 ("self") can call a method in obj1 by doing:
<BR>
$self->report->slot_for_obj1->method;
<BR>
[download]
I.e. go through the report object to access other objects.

It also contains methods for resolving settings from CGI input with defaults. Lastly it contains high-level methods for printing to file and browser.
The logfile object
The logfile object represents the log file. It has a method next_line for returning the next line of the log file. It also parses the date of the line.
The input object
The input object represents the user's input from the HTML form. It particularly likes parameter names in three parts with hyphens between each part, i.e. "a-query-apples". It stores these parameters in a tree structure so that each object can go in and look up its own parameters.
The category object
The category object should be subclassed (and is so by the Pages.pm, Query.pm categories) but can be used right off the bat if configured properly. It can hold a pattern and do matching on the line. It stores statistics on matches in a tree structure, called "tree". It can present itself in HTML and does so in three ways:

As description and checkbox for user interface
As report fragment with name, bar chart
As Table of contents fragment with link to report fragment

The HTML_output object
The HTML_output object contains utility scripts (er, methods) for printing out stuff: Form elements, bar charts.

The code is available at sourceforge, but I doubt that it is of any practical value for anyone else as yet. It does not use a data base and doesn't split lines into their items, so it doesn't fit your specs.

/jeorgen

In reply to Re: Dealing with large logfiles by jeorgen
in thread Dealing with large logfiles by arturo

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.