dug has asked for the wisdom of the Perl Monks concerning the following question:

Hello all, Maybe it's just Saturday and I've had more beer than coffee, but I'm having a difficult time coming up with a solution for the following problem that is not a kludge.

Problem:

I have three pieces of related data per line in a syslog file, "source_code", "action", and "sub_action". Every sub_action belongs to an action, and every action belongs to a source_code. An action or sub_action appearing on a line with one source_code doesn't mean it won't appear on a different line with a different source_code. I need to periodically roll through the logfile, and gather the following aggregate statistics:
A Solution:
#!/usr/bin/perl -w use strict; $|++; my %source; my %action; my %sub_action; ## # process file while (<DATA>) { my ($source, $action, $sub_action) = split; my $source_action = $source . "||" . $action; # sub_action isn't required to appear my $source_sub_action = $source_action . "||" . $sub_action if $sub_action; $source{$source}++; $action{$source_action}++; # sub_action isn't required to appear $sub_action{$source_sub_action}++ if $source_sub_action; } ## # print statistics while (my ($source_code, $source_code_count) = each %source) { print "source code: $source_code count: $source_code_count\n"; # print actions and counts for this source code foreach my $action (keys %action) { print "action: $action count: $action{$action}\n" if $action =~ /$source_code\|\|/; } # print sub_actions and counts for this source code foreach my $sub_action (keys %sub_action) { print "sub action: $sub_action count: $sub_action{$sub_action}\n" if $sub_action =~ /$source_code\|\|/; } } __DATA__ source1 QUEUED source1 QUEUED source1 CLICK linkid1 source1 CLICK linkid1 source1 CLICK linkid2 source2 QUEUED source2 CLICK linkid1 source2 CLICK linkid1 source2 CLICK linkid2
This solution produces the proper results, printing the following:
source code: source1 count: 5
action: source1||CLICK count: 3
action: source1||QUEUED count: 2
sub action: source1||CLICK||linkid1 count: 2
sub action: source1||CLICK||linkid2 count: 1
source code: source2 count: 4
action: source2||CLICK count: 3
action: source2||QUEUED count: 1
sub action: source2||CLICK||linkid1 count: 2
sub action: source2||CLICK||linkid2 count: 1
Like any other student of programming, proper results aren't enough for me. Style, efficiency, beer and fast cars are also important. I really don't like the attack of:
# build datastructures while (logfile) { build hash1; build hash2; build hash3; } # process the datastructures foreach key value (hash1) { foreach over keys of hash2; foreach over keys of hash3; }
I guess that's why I'm here, at Seekers of Perl Wisdom.

Looking for another way,
dug

Replies are listed 'Best First'.
Re: Choosing the right datastructure
by Ovid (Cardinal) on Apr 07, 2002 at 01:39 UTC

    The following appears to do the trick. I've just dumped the results with Data::Dumper, so it's up to you to figure out who you want to format them.

    #!/usr/bin/perl -w use strict; $|++; my %source; # process file while (<DATA>) { my ($source, $action, $sub_action) = split; my $source_action = $source . "||" . $action; # sub_action isn't required to appear $source{ $source }{ count }++; $source{ $source }{ $action }{ count }++; $source{ $source }{ $action }{ $sub_action }{ count }++ if $sub_acti +on; } use Data::Dumper; print Dumper \%source; __DATA__ source1 QUEUED source1 QUEUED source1 CLICK linkid1 source1 CLICK linkid1 source1 CLICK linkid2 source2 QUEUED source2 CLICK linkid1 source2 CLICK linkid1 source2 CLICK linkid2

    Cheers,
    Ovid

    Update: If Data::Dumper output bugs you, here's a quick hack at printing the results.

    foreach my $source ( sort keys %source ) { print "$source\n\tcount: $source{$source}{count}\n"; foreach my $action ( sort keys %{$source{$source}} ) { next if $action eq 'count'; print "\t$action\n\t\tcount: $source{$source}{$action}{count}\n"; foreach my $sub_action (sort keys %{$source{$source}{$action}} ) { next if $sub_action eq 'count'; print "\t\t$sub_action\n\t\t\tcount: $source{$source}{$action}{$ +sub_action}{count}\n"; } } }

    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

      Thanks.
      My initial confusion came from thinking that:
      my %source; my %action; my ($source, $action) = qw(test_source test_action); $source{$source} = { $action{$action}++ }; $source{$source} = { $action{$action}++ };
      would give me a hash called %source with each key having a value of the hash "%action", with it's keys and values.
      It output:
      $VAR1 = 'test_source';
      $VAR2 = {
                '1' => undef
              };
      $VAR1 = 'test_action';
      $VAR2 = '2';
      

      where I was looking for:
      $VAR1 = 'test_source';
      $VAR2 = {
                'test_action' => 2
              };
      
      It makes Perl-fect sense, though, that it would set the value of $source{$action} to true, while setting the keys and values of %action_hash.

      Thanks for helping me grok this.
        dug