Parsing a Simple Log File

bichonfrise74 has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to parse a log file and I'm pretty sure it is really easy, I'm just mixing some commands and getting weird results. Please see below.

Here's my script

#!/usr/bin/perl
use strict;

my %record = ();
my ($title, $row);

while (<>) {
   if (m/^\scustomer/i ... /rows\)/) {

    chomp;
    next if (/^\s+$/);
    next if (/^TEST/);
    next if (/---/);
    next if (/row/);

    if ($_ =~ m/\s(\D+)/i) { $title = $1; }
    if ($_ =~ m/\s+(\d+)/i) { $row = $1; }

    $record{$title} = $row if ( $title ne " ");

  }
}

map { print "$_ -> $record{$_}\n" } %record;
[download]

Sample Data:

TEST HEADER
 CUSTOMERPHONE
-----------
   1300
(1 row)

TEST HEADER
 CUSTOMERORDER
-----------
   0
(1 row)

TEST HEADER
 CUSTOMERCARE
-----------
   530
(1 row)
[download]

Output that I am getting:

CUSTOMERORDER -> 1300
1300 -> 
   -> 530
530 -> 
CUSTOMERPHONE -> 
 -> 
CUSTOMERCARE -> 0
0 ->
[download]

Output that I want:

CUSTOMERPHONE -> 1300
CUSTOMERORDER -> 0
CUSTOMERCARE -> 530
[download]

Comment on Parsing a Simple Log File Select or Download Code

Replies are listed 'Best First'.
Re: Parsing a Simple Log File by GrandFather (Saint) on Nov 05, 2008 at 00:21 UTC
If you work with records as units rather than with lines things get a little easier: `use warnings; use strict; my @records; local $/ = "\n\n"; while (<DATA>) { my ($title, $row) = /TEST\s+\w+\W+(\w+)\D+(\d+)/; next unless defined $row; push @records, [$title, $row]; } print "$_->[0] -> $_->[1]\n" for @records; __DATA__ data per sample` [download] Prints: `CUSTOMERPHONE -> 1300 CUSTOMERORDER -> 0 CUSTOMERCARE -> 530` [download] Note that @records is used to preserve the order of the data in the source. The reversion to using a hash should be simple and obvious. Perl reduces RSI - it saves typing	[reply] [d/l] [select]
Re^2: Parsing a Simple Log File by bichonfrise74 (Vicar) on Nov 05, 2008 at 00:39 UTC
Your script looks very elegant as compared to mine. I didn't think of using $/ as a delimiter between each 'records'.	[reply]
Re^2: Parsing a Simple Log File by gone2015 (Deacon) on Nov 05, 2008 at 13:16 UTC
Building on that base, I suggest... Unless you are absolutely sure that the input will always be in exactly the right form (now and in the future), it's a good idea to do something when the regex doesn't match -- at least a diagnostic message indicating that something isn't as expected. Along the lines of: `if (my ($title, $row) = /TEST\s+\w+\W+(\w+)\D+(\d+)/) { push @records, [$title, $row]; } else { warn "no match at $." ; } ;` [download] Of course you could make the warning message more helpful, or do something else to flag the problem. In passing, when I checked this fragment I noted that the regex is "widely drafted" (as the lawyers would say). For real (as opposed to example) code it's a good idea to tighten up the regex, so that it doesn't happily provide duff results from duff data.	[reply] [d/l]
Re: Parsing a Simple Log File by JavaFan (Canon) on Nov 04, 2008 at 23:56 UTC
Several problems here. `next if /^\s+$/` skips lines containing nothing but whitespace, but doesn't skip lines that are completely empty. Use `next if /^\s*$/` or `next unless /\S/` instead. The capturing regexes used to extract title and row aren't anchored. Use `m/^\s(\D+)$/i` and `m/^\s+(\d+)$/i`. In the final map, you iterate over %record. Iterate over keys %record instead.	[reply] [d/l] [select]
Re: Parsing a Simple Log File by monarch (Priest) on Nov 04, 2008 at 23:59 UTC
You have two problems. First, when you capture title you're attempting to capture non-digits (e.g. `/(\D+)/`). The trouble is that whitespace also qualifies as non-digits. So you're creating empty titles from time to time, which explains the funny whitespace when printing out the blank keys. Try instead to capture only what you're looking for, i.e. characters that aren't digits, or `/([A-Za-z]+)/`. Second you're printing out the hash, but really you want to map over the keys of the hash. Add the keyword `keys` in front of the hash to get the desired result, otherwise the map iterates over key, value, key, value, etc..	[reply] [d/l] [select]
Re: Parsing a Simple Log File by ccn (Vicar) on Nov 04, 2008 at 23:58 UTC
Ugly but quite short code: `while (<>) { next unless /CUSTOMER/; my $count = (<>, <>); # WTF s/^\s+//, s/\s+$// for $_, $count; print "$_ -> $count\n"; }` [download]	[reply] [d/l]