Hello all, Maybe it's just Saturday and I've had more beer than coffee, but I'm having a difficult time coming up with a solution for the following problem that is not a kludge.

Problem:

I have three pieces of related data per line in a syslog file, "source_code", "action", and "sub_action". Every sub_action belongs to an action, and every action belongs to a source_code. An action or sub_action appearing on a line with one source_code doesn't mean it won't appear on a different line with a different source_code. I need to periodically roll through the logfile, and gather the following aggregate statistics:
A Solution:
#!/usr/bin/perl -w use strict; $|++; my %source; my %action; my %sub_action; ## # process file while (<DATA>) { my ($source, $action, $sub_action) = split; my $source_action = $source . "||" . $action; # sub_action isn't required to appear my $source_sub_action = $source_action . "||" . $sub_action if $sub_action; $source{$source}++; $action{$source_action}++; # sub_action isn't required to appear $sub_action{$source_sub_action}++ if $source_sub_action; } ## # print statistics while (my ($source_code, $source_code_count) = each %source) { print "source code: $source_code count: $source_code_count\n"; # print actions and counts for this source code foreach my $action (keys %action) { print "action: $action count: $action{$action}\n" if $action =~ /$source_code\|\|/; } # print sub_actions and counts for this source code foreach my $sub_action (keys %sub_action) { print "sub action: $sub_action count: $sub_action{$sub_action}\n" if $sub_action =~ /$source_code\|\|/; } } __DATA__ source1 QUEUED source1 QUEUED source1 CLICK linkid1 source1 CLICK linkid1 source1 CLICK linkid2 source2 QUEUED source2 CLICK linkid1 source2 CLICK linkid1 source2 CLICK linkid2
This solution produces the proper results, printing the following:
source code: source1 count: 5
action: source1||CLICK count: 3
action: source1||QUEUED count: 2
sub action: source1||CLICK||linkid1 count: 2
sub action: source1||CLICK||linkid2 count: 1
source code: source2 count: 4
action: source2||CLICK count: 3
action: source2||QUEUED count: 1
sub action: source2||CLICK||linkid1 count: 2
sub action: source2||CLICK||linkid2 count: 1
Like any other student of programming, proper results aren't enough for me. Style, efficiency, beer and fast cars are also important. I really don't like the attack of:
# build datastructures while (logfile) { build hash1; build hash2; build hash3; } # process the datastructures foreach key value (hash1) { foreach over keys of hash2; foreach over keys of hash3; }
I guess that's why I'm here, at Seekers of Perl Wisdom.

Looking for another way,
dug

In reply to Choosing the right datastructure by dug

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.