natxo has asked for the wisdom of the Perl Monks concerning the following question:

I have this _DATA_ I want to parse:
Event[0]: Log Name: Microsoft-Windows-GroupPolicy/Operational Source: Microsoft-Windows-GroupPolicy Date: 2014-06-26T13:58:04.290 Event ID: 7320 Task: N/A Level: Error Opcode: Info Keyword: N/A User: S-1-5-18 User Name: NT AUTHORITY\SYSTEM Computer: hostname Description: Error: Computer determined to be not in a site. Error code 0x77F. Event[1]: Log Name: Microsoft-Windows-GroupPolicy/Operational Source: Microsoft-Windows-GroupPolicy Date: 2014-06-26T12:32:30.009 Event ID: 7320 Task: N/A Level: Error Opcode: Info Keyword: N/A User: S-1-5-21-1024758968-3939101906-3775097912-6653 User Name: whatever Computer: hostname Description: Error: Computer determined to be not in a site. Error code 0x77F.
This is a (very) small dump as text of Windows event logs (new format). Every value field can be different, but some might be the same. I want to analize this and report how many instances of errors there are per log type.
So far I have this:
use strict; use warnings; # global vars my ( $hash_ref, $logname, $source, $date, $evt_id, $error, $descriptio +n) ; # we'll increase this after every line in the while loop my $count = 1; # get every line and save it in $line. # then separate the lines in $key/$value pairs divided by ':' while ( my $line = <DATA> ) { # a paragragh is a line $/ = ""; chomp $line; my @line = split( /[\n\r]/, $line ); for (@line) { next if $_ =~ /^\w+.*$/; next if $_ =~ /\s+Description.*$/; my ( $key, $value ) = split(/: /); if ( $key =~ m/^\s+Log Name.*$/ ) { $logname = $value; } elsif ( $key =~ m/^\s+Source.*$/ ) { $source = $value; } elsif ( $key =~ m/^\s+Date.*$/ ) { $date = $value; } elsif ( $key =~ m/^\s+Event ID.*$/ ) { $evt_id = $value; } elsif ( $key =~ m/^\s+Level.*$/ ) { $error = $value; } $hash_ref->{$count} = { Logname => $logname, Source => $source, Date => $date, "Event ID" => $evt_id, Error => $error, }; } $count++; } use Data::Dumper; print Dumper $hash_ref; __DATA__ here is the dump with the format at the beginning
So I can save the data I need in a hash of hashes for later processing Ideally I would rather use the HoH like this:
$hash_ref->{$logname} = { Logname => $logname, Source => $source, Date => $date, "Event ID" => $evt_id, Error => $error, };
but that does not work because it overwrites every event of the same logname and only the last event remains. Is there a simpler way of achieving what I want?

And bonus question: right now I discard the Description value because it appears on a different line after the ':'. Is there a way of getting this value as well. This is not very important because we have the event id and the logname, so we can deduce the value but it would be nice to have. Thanks!

Replies are listed 'Best First'.
Re: question on data structure
by Cristoforo (Curate) on Jun 27, 2014 at 17:46 UTC
    As AppleFritter said, it seems you would need an array of hash references. the code below would accomplish that.
    #!/usr/bin/perl use strict; use warnings; my @data; my $keys = 'Log Name|Source|Date|Event ID'; { local $/ = ''; while (<DATA>) { chomp; my %temp = /($keys): (.+)/g; if (/^\s+Description:\s+(.+)\z/sm) { $temp{description} = $1; } push @data, \%temp; } } use Data::Dumper; print Dumper \@data;
    Using the data you posted, it dumps:
    $VAR1 = [ { 'Event ID' => '7320', 'Log Name' => 'Microsoft-Windows-GroupPolicy/Operational', 'Source' => 'Microsoft-Windows-GroupPolicy', 'description' => 'Error: Computer determined to be not in +a site. Error code 0x77F.', 'Date' => '2014-06-26T13:58:04.290' }, { 'Event ID' => '7320', 'Log Name' => 'Microsoft-Windows-GroupPolicy/Operational', 'Source' => 'Microsoft-Windows-GroupPolicy', 'description' => 'Error: Computer determined to be not in +a site. Error code 0x77F.', 'Date' => '2014-06-26T12:32:30.009' } ];
    Chris
      hi,

      I just read your answer and I really like it, much more idiomatic than my clumsy baby perl. It does exactly what I need. Thanks!

        Glad it was useful for you!
Re: question on data structure
by AppleFritter (Vicar) on Jun 27, 2014 at 16:32 UTC

    If I understand correctly, you're wondering what the best data structure for storing the parsed data would be, right? I'd say that depends on what you intend to do with it later on, but as a general rule of thumb, it's usually best to use what most closely resembles the natural structure of the raw data you're reading.

    Going by your sample data, this would seem to be an array of hashes, indexed by the event number (0, 1, ...) and then the various fields appearing in these logs. Which is pretty much what you're doing, though you're using a hash with integer keys -- an array in all but name, really.

    I want to analize this and report how many instances of errors there are per log type.

    That said, I'm not sure what you mean here. What is a "log type"? Assuming you have a variable encoding this, if you merely want aggregates, you could do this in your loop:

    ... # calculate $logtype $hash_ref->{$logtype}++;

    In other words, it all depends on what you want to do.

    And bonus question: right now I discard the Description value because it appears on a different line after the ':'. Is there a way of getting this value as well. This is not very important because we have the event id and the logname, so we can deduce the value but it would be nice to have. Thanks!

    You should be able to just read from DATA again, like so:

    if($_ =~ /\s+Description.*$/) { chomp($description = <DATA>); }

    In the event that the description can span several lines, add another loop there that keeps on reading from DATA until it encounters an empty line:

    if($_ =~ /\s+Description.*$/) { while(chomp($description_line = <DATA>)) { last if $description_line eq ""; $description .= $description_line } }

    Note that none of this is tested in any way.

Re: question on data structure
by neilwatson (Priest) on Jun 27, 2014 at 14:34 UTC