in reply to Multiline log parsing

If you have control over the creation of the multiline logs, maybe you can help things out by always terminating logs with multiple newlines (or a meaningful token); then, by setting the input record terminator to "\n\n" (or said meaningful token), you can easily grapple with one record at a time when you read the logfiles in...

At any rate, your script will probably have to know what keywords to look for, perhaps read from a config file. The config file might just be a plain list of words, or a list of patterns (or a list of substitutions!), or even another perl module which maps keywords to function calls, closures, state changes, what-have-you. Might get a little messy tho.

But if your script is just flagging down things for you to manually check out, then a simple hashtable of keywords and their resulting warning text might do.

Replies are listed 'Best First'.
Re: Re: Multiline log parsing
by Jenda (Abbot) on Sep 24, 2002 at 17:12 UTC

    I'm afraid the regexp that'd match the whole file would be too long and messy. And testing each line separately whether its something unexpected is not good enough, I need to test the lines in context.

    But I like this config file with patterns and state changes idea. I think I'll use something like

    ( START => { '^FileCreate \d+\.\d+.\d+$' => 'START', '^---- Ticking: \d{4}/\d\d/\d\d \d\d:\d\d:\d\d - \d\d:\d\d:\d\ +d$' => 'START', '^Creating files for site ' => 'FILES' }, FILES => { '^\tCreating file ' => 'FILE', '^File generation succeeded for site ' => 'START', '^File generation failed for site ' => '--ERROR--', '^Jobs for site \d+ with parameter type "\w+" are to be proces +sed by HTTPPost or something.' => 'FILES', '^Site \d+ has posting parameters either only for single or fo +r package jobs!!!' => 'FILES', }, ... )
    or maybe
    ( START => [ qr'^FileCreate \d+\.\d+.\d+$' => 'START', qr'^---- Ticking: \d{4}/\d\d/\d\d \d\d:\d\d:\d\d - \d\d:\d\d:\ +d\d$' => 'START', qr'^Creating files for site ' => 'FILES' ], FILES => [ qr'^\tCreating file ' => 'FILE', qr'^File generation succeeded for site ' => 'START', qr'^File generation failed for site ' => '--ERROR--', qr'^Jobs for site \d+ with parameter type "\w+" are to be proc +essed by HTTPPost or something.' => 'FILES', qr'^Site \d+ has posting parameters either only for single or +for package jobs!!!' => 'FILES', ], ... )
    and read it with do() or eval().

    The second has two advantages. The regexps will be precompiled and they will be tested in a dependable order. But the code will look a little awkward.

    Thanks for your ideas, Jenda