question on data structure

natxo has asked for the wisdom of the Perl Monks concerning the following question:

I have this _DATA_ I want to parse:

Event[0]:
  Log Name: Microsoft-Windows-GroupPolicy/Operational
  Source: Microsoft-Windows-GroupPolicy
  Date: 2014-06-26T13:58:04.290
  Event ID: 7320
  Task: N/A
  Level: Error
  Opcode: Info
  Keyword: N/A
  User: S-1-5-18
  User Name: NT AUTHORITY\SYSTEM
  Computer: hostname
  Description: 
Error: Computer determined to be not in a site. Error code 0x77F.

Event[1]:
  Log Name: Microsoft-Windows-GroupPolicy/Operational
  Source: Microsoft-Windows-GroupPolicy
  Date: 2014-06-26T12:32:30.009
  Event ID: 7320
  Task: N/A
  Level: Error
  Opcode: Info
  Keyword: N/A
  User: S-1-5-21-1024758968-3939101906-3775097912-6653
  User Name: whatever
  Computer: hostname
  Description: 
Error: Computer determined to be not in a site. Error code 0x77F.
[download]

This is a (very) small dump as text of Windows event logs (new format). Every value field can be different, but some might be the same. I want to analize this and report how many instances of errors there are per log type.

So far I have this:


use strict;
use warnings;


# global vars
my ( $hash_ref, $logname, $source, $date, $evt_id, $error, $descriptio
+n) ;

# we'll increase this after every line in the while loop
my $count = 1;

# get every line and save it in $line.
# then separate the lines in $key/$value pairs divided by ':' 
while ( my $line = <DATA> ) {

    # a paragragh is a line
    $/ = "";

    chomp $line;
    my @line = split( /[\n\r]/, $line );

    for (@line) {
        next if $_ =~ /^\w+.*$/;
        next if $_ =~ /\s+Description.*$/;
        my ( $key, $value ) = split(/: /);
        if ( $key =~ m/^\s+Log Name.*$/ ) {
            $logname = $value;
        }
        elsif ( $key =~ m/^\s+Source.*$/ ) {
            $source = $value;
        }
        elsif ( $key =~ m/^\s+Date.*$/ ) {
            $date = $value;
        }
        elsif ( $key =~ m/^\s+Event ID.*$/ ) {
            $evt_id = $value;
        }
        elsif ( $key =~ m/^\s+Level.*$/ ) {
            $error = $value;
        }

        $hash_ref->{$count} = {
            Logname    => $logname,
            Source     => $source,
            Date       => $date,
            "Event ID" => $evt_id,
            Error      => $error,
        };

    }
    $count++;
}

use Data::Dumper;
print Dumper $hash_ref;

__DATA__
here is the dump with the format at the beginning
[download]

So I can save the data I need in a hash of hashes for later processing Ideally I would rather use the HoH like this:


        $hash_ref->{$logname} = {
            Logname    => $logname,
            Source     => $source,
            Date       => $date,
            "Event ID" => $evt_id,
            Error      => $error,
        };
[download]

but that does not work because it overwrites every event of the same logname and only the last event remains. Is there a simpler way of achieving what I want?

And bonus question: right now I discard the Description value because it appears on a different line after the ':'. Is there a way of getting this value as well. This is not very important because we have the event id and the logname, so we can deduce the value but it would be nice to have. Thanks!

Comment on question on data structure Select or Download Code

Replies are listed 'Best First'.
Re: question on data structure by Cristoforo (Curate) on Jun 27, 2014 at 17:46 UTC
As AppleFritter said, it seems you would need an array of hash references. the code below would accomplish that. `#!/usr/bin/perl use strict; use warnings; my @data; my $keys = 'Log Name\|Source\|Date\|Event ID'; { local $/ = ''; while (<DATA>) { chomp; my %temp = /($keys): (.+)/g; if (/^\s+Description:\s+(.+)\z/sm) { $temp{description} = $1; } push @data, \%temp; } } use Data::Dumper; print Dumper \@data;` [download] Using the data you posted, it dumps: $VAR1 = [ { 'Event ID' => '7320', 'Log Name' => 'Microsoft-Windows-GroupPolicy/Operational', 'Source' => 'Microsoft-Windows-GroupPolicy', 'description' => 'Error: Computer determined to be not in +a site. Error code 0x77F.', 'Date' => '2014-06-26T13:58:04.290' }, { 'Event ID' => '7320', 'Log Name' => 'Microsoft-Windows-GroupPolicy/Operational', 'Source' => 'Microsoft-Windows-GroupPolicy', 'description' => 'Error: Computer determined to be not in +a site. Error code 0x77F.', 'Date' => '2014-06-26T12:32:30.009' } ]; [download] Chris	[reply] [d/l] [select]
Re^2: question on data structure by natxo (Scribe) on Jul 07, 2014 at 20:30 UTC
hi, I just read your answer and I really like it, much more idiomatic than my clumsy baby perl. It does exactly what I need. Thanks!	[reply]
Re^3: question on data structure by Cristoforo (Curate) on Jul 07, 2014 at 22:46 UTC
Glad it was useful for you!	[reply]
Re: question on data structure by AppleFritter (Vicar) on Jun 27, 2014 at 16:32 UTC
If I understand correctly, you're wondering what the best data structure for storing the parsed data would be, right? I'd say that depends on what you intend to do with it later on, but as a general rule of thumb, it's usually best to use what most closely resembles the natural structure of the raw data you're reading. Going by your sample data, this would seem to be an array of hashes, indexed by the event number (0, 1, ...) and then the various fields appearing in these logs. Which is pretty much what you're doing, though you're using a hash with integer keys -- an array in all but name, really. I want to analize this and report how many instances of errors there are per log type. That said, I'm not sure what you mean here. What is a "log type"? Assuming you have a variable encoding this, if you merely want aggregates, you could do this in your loop: `... # calculate $logtype $hash_ref->{$logtype}++;` [download] In other words, it all depends on what you want to do. And bonus question: right now I discard the Description value because it appears on a different line after the ':'. Is there a way of getting this value as well. This is not very important because we have the event id and the logname, so we can deduce the value but it would be nice to have. Thanks! You should be able to just read from `DATA` again, like so: `if($_ =~ /\s+Description.$/) { chomp($description = <DATA>); }` [download] In the event that the description can span several lines, add another loop there that keeps on reading from `DATA` until it encounters an empty line: `if($_ =~ /\s+Description.$/) { while(chomp($description_line = <DATA>)) { last if $description_line eq ""; $description .= $description_line } }` [download] Note that none of this is tested in any way.	[reply] [d/l] [select]
Re: question on data structure by neilwatson (Priest) on Jun 27, 2014 at 14:34 UTC
Use the date stamp to index the events. Neil Watson watson-wilson.ca	[reply]