comment on

Hello all, Maybe it's just Saturday and I've had more beer than coffee, but I'm having a difficult time coming up with a solution for the following problem that is not a kludge.

Problem:

I have three pieces of related data per line in a syslog file, "source_code", "action", and "sub_action". Every sub_action belongs to an action, and every action belongs to a source_code. An action or sub_action appearing on a line with one source_code doesn't mean it won't appear on a different line with a different source_code. I need to periodically roll through the logfile, and gather the following aggregate statistics:

Total appearances of each source_code
For each source_code, the total appearances of each action
For each action as it appears by source_code, the total appearances of each of it's sub_actions.

A Solution:


#!/usr/bin/perl -w
use strict;
$|++;

my %source;
my %action;
my %sub_action;

##
# process file
while (<DATA>) {
  my ($source, $action, $sub_action) = split;
  my $source_action = $source . "||" . $action;
  # sub_action isn't required to appear
  my $source_sub_action = $source_action . "||" . $sub_action
    if $sub_action; 
  $source{$source}++; 
  $action{$source_action}++;
  # sub_action isn't required to appear
  $sub_action{$source_sub_action}++ 
    if $source_sub_action;
}

##
# print statistics
while (my ($source_code, $source_code_count) = each %source) {
  print "source code: $source_code count: $source_code_count\n";

  # print actions and counts for this source code
  foreach my $action (keys %action) {
    print "action: $action count: $action{$action}\n" 
      if $action =~ /$source_code\|\|/;
  }
   
  # print sub_actions and counts for this source code
  foreach my $sub_action (keys %sub_action) {
    print "sub action: $sub_action count: $sub_action{$sub_action}\n"
      if $sub_action =~ /$source_code\|\|/;
  }
}

__DATA__
source1 QUEUED
source1 QUEUED
source1 CLICK linkid1
source1 CLICK linkid1
source1 CLICK linkid2
source2 QUEUED
source2 CLICK linkid1
source2 CLICK linkid1
source2 CLICK linkid2
[download]

This solution produces the proper results, printing the following:

source code: source1 count: 5
action: source1||CLICK count: 3
action: source1||QUEUED count: 2
sub action: source1||CLICK||linkid1 count: 2
sub action: source1||CLICK||linkid2 count: 1
source code: source2 count: 4
action: source2||CLICK count: 3
action: source2||QUEUED count: 1
sub action: source2||CLICK||linkid1 count: 2
sub action: source2||CLICK||linkid2 count: 1

Like any other student of programming, proper results aren't enough for me. Style, efficiency, beer and fast cars are also important. I really don't like the attack of:

# build datastructures
while (logfile) {
  build hash1;
  build hash2;
  build hash3;
}

# process the datastructures
foreach key value (hash1) {
  foreach over keys of hash2; 
  foreach over keys of hash3; 
}
[download]

I guess that's why I'm here, at Seekers of Perl Wisdom.

Looking for another way,
dug

In reply to Choosing the right datastructure by dug

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.