JasonJ has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to parse a log file. I was working on using an array but now it looks like I should use a hash. What do you think? Other possibilities?

The input file has to be parsed with blank lines as a field separator. I will be searching it for 5 different strings, the results of the search will be put into 5 different log files. I will then send those log files to different people as email attachments.

Should I use an array, a hash, or is there another way to search through the file? Kind of like using grep with the blank line (or lines) as the field separator?

This node was helpful. http://www.perlmonks.org/?node_id=47236

This node made me rethink my approach. http://www.perlmonks.org/?node_id=610

Thanks!

Jason

  • Comment on Parsing log file with blank lines for the field separator

Replies are listed 'Best First'.
Re: Parsing log file with blank lines for the field separator
by apl (Monsignor) on May 05, 2008 at 18:52 UTC
    You don't need hashes or arrays.
    1. read a record
    2. determine which output file the record belongs in (if at all)
    3. write the record to the appropriate output file (if appropriate)
Re: Parsing log file with blank lines for the field separator
by Narveson (Chaplain) on May 05, 2008 at 18:56 UTC
    The input file has to be parsed with blank lines as a field separator.
    # read in paragraph mode local $/ = ''; while (<DATA>) { print "This is paragraph $.: $_"; } __DATA__ Now is the time for all good men to come to the aid of the party. The quick brown fox jumps over the lazy dog.
Re: Parsing log file with blank lines for the field separator
by mscharrer (Hermit) on May 05, 2008 at 19:31 UTC
    I would code that like below without storing the paragraphs in arrays and definitive not in hashes.
    use strict; use warnings; my @string = ('String A', 'String B', 'String C', 'String D', 'String +E'); my @logfile = ('A.log', 'B.log', 'C.log', 'D.log', 'E.log'); my @logfh; open (IN, '<', 'input.txt') or die "Couldn't open input file!\n"; for my $i (0..$#logfile) { open ($logfh[$i], '>', $logfile[$i]) or die "Couldn't create logfi +le!\n"; } local $/ = ''; while (my $paragraph = <IN>) { for my $i (0..$#string) { if ( index($paragraph, $string[$i]) >= 0) { print {$logfh[$i]} $paragraph; } } } close IN; foreach my $fh (@logfh) { close $fh; }
    This is for fixed strings. If you need regexes then replace the @string array with an array of regexes using qr{ } and replace the use of index with an regex call, ala $paragraph =~ $regex[$i].

      A hash might be nicer:

      use strict; use warnings; my @strings = ('String A', 'String B', 'String C', 'String D', 'String + E'); my %string_handles = map { open my $fh, '>', "$_.log" or die "Couldn't + create '$_.log': $!"; $_ => $fh } @strings; open (my $in_fh, '<', 'input.txt') or die "Couldn't open input file!\n +"; local $/ = ''; while (my $paragraph = <$in_fh>) { while (my ($string, $out_fh) = each %string_handles) { next unless index($paragraph, $string) >= 0; print {$out_fh} $paragraph; } } close $in_fh;
        chromatic,

        I was able to use the hash you gave. Works perfectly. Could you please point me to a resource where I can learn more on hashes? I am not sure just what this code does and would like to be able to maintain it going forward.

        Thanks!

        Jason

      Thanks for all of the responses.

      I am having trouble with this bit of code. Here is what I have.

      if(-e $ERRORLOG) { print LOG "$FORMATTEDTIME Able to open the CaseError.log file.\n +"; my @string = ('FOO', 'BAR'); my @logfile = ('FOOLOG', 'BARLOG'); my @logfh; open(IN, "<$ERRORLOG") or print LOG "$FORMATTEDTIME Cant open th +e Error.log file. \n"; for my $i (0..$#logfile) { open ($logfh[$i], '>', $logfile[$i]) or die "Couldn't create logfi +le!\n"; local $/ = ''; while (my $paragraph = <IN>) { for my $i (0..$#string) { if ( index($paragraph, $string[$i]) >= 0) { print {$logfh[$i]} $paragraph; } } } close IN; foreach my $fh (@logfh) { close $fh; } } }

      I am getting the following error.

      Use of uninitialized value in ref-to-glob cast at D:\pl\ErrorCheck.pl l ine 68, <IN> chunk 1. print() on unopened filehandle at D:\pl\ErrorCheck.pl line 68, <IN> chunk 1.

      This has entried for chunk 1-8.

      I also get "readline() on closed filehandle IN at D:\pl\ErrorCheck.pl line 65."

      Any thoughts? Did I totally mess this up?