bichonfrise74 has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to parse a log file and I'm pretty sure it is really easy, I'm just mixing some commands and getting weird results. Please see below.

Here's my script
#!/usr/bin/perl use strict; my %record = (); my ($title, $row); while (<>) { if (m/^\scustomer/i ... /rows\)/) { chomp; next if (/^\s+$/); next if (/^TEST/); next if (/---/); next if (/row/); if ($_ =~ m/\s(\D+)/i) { $title = $1; } if ($_ =~ m/\s+(\d+)/i) { $row = $1; } $record{$title} = $row if ( $title ne " "); } } map { print "$_ -> $record{$_}\n" } %record;


Sample Data:
TEST HEADER CUSTOMERPHONE ----------- 1300 (1 row) TEST HEADER CUSTOMERORDER ----------- 0 (1 row) TEST HEADER CUSTOMERCARE ----------- 530 (1 row)


Output that I am getting:
CUSTOMERORDER -> 1300 1300 -> -> 530 530 -> CUSTOMERPHONE -> -> CUSTOMERCARE -> 0 0 ->


Output that I want:
CUSTOMERPHONE -> 1300 CUSTOMERORDER -> 0 CUSTOMERCARE -> 530

Replies are listed 'Best First'.
Re: Parsing a Simple Log File
by GrandFather (Saint) on Nov 05, 2008 at 00:21 UTC

    If you work with records as units rather than with lines things get a little easier:

    use warnings; use strict; my @records; local $/ = "\n\n"; while (<DATA>) { my ($title, $row) = /TEST\s+\w+\W+(\w+)\D+(\d+)/; next unless defined $row; push @records, [$title, $row]; } print "$_->[0] -> $_->[1]\n" for @records; __DATA__ data per sample

    Prints:

    CUSTOMERPHONE -> 1300 CUSTOMERORDER -> 0 CUSTOMERCARE -> 530

    Note that @records is used to preserve the order of the data in the source. The reversion to using a hash should be simple and obvious.


    Perl reduces RSI - it saves typing
      Your script looks very elegant as compared to mine. I didn't think of using $/ as a delimiter between each 'records'.

      Building on that base, I suggest...

      Unless you are absolutely sure that the input will always be in exactly the right form (now and in the future), it's a good idea to do something when the regex doesn't match -- at least a diagnostic message indicating that something isn't as expected. Along the lines of:

      if (my ($title, $row) = /TEST\s+\w+\W+(\w+)\D+(\d+)/) { push @records, [$title, $row]; } else { warn "no match at $." ; } ;
      Of course you could make the warning message more helpful, or do something else to flag the problem.

      In passing, when I checked this fragment I noted that the regex is "widely drafted" (as the lawyers would say). For real (as opposed to example) code it's a good idea to tighten up the regex, so that it doesn't happily provide duff results from duff data.

Re: Parsing a Simple Log File
by JavaFan (Canon) on Nov 04, 2008 at 23:56 UTC
    Several problems here.
    1. next if /^\s+$/ skips lines containing nothing but whitespace, but doesn't skip lines that are completely empty. Use next if /^\s*$/ or next unless /\S/ instead.
    2. The capturing regexes used to extract title and row aren't anchored. Use m/^\s(\D+)$/i and m/^\s+(\d+)$/i.
    3. In the final map, you iterate over %record. Iterate over keys %record instead.
Re: Parsing a Simple Log File
by monarch (Priest) on Nov 04, 2008 at 23:59 UTC

    You have two problems. First, when you capture title you're attempting to capture non-digits (e.g. /(\D+)/). The trouble is that whitespace also qualifies as non-digits. So you're creating empty titles from time to time, which explains the funny whitespace when printing out the blank keys. Try instead to capture only what you're looking for, i.e. characters that aren't digits, or /([A-Za-z]+)/.

    Second you're printing out the hash, but really you want to map over the keys of the hash. Add the keyword keys in front of the hash to get the desired result, otherwise the map iterates over key, value, key, value, etc..

Re: Parsing a Simple Log File
by ccn (Vicar) on Nov 04, 2008 at 23:58 UTC

    Ugly but quite short code:

    while (<>) { next unless /CUSTOMER/; my $count = (<>, <>); # WTF s/^\s+//, s/\s+$// for $_, $count; print "$_ -> $count\n"; }