This is what the source data looks like.
Element 1 - date
Element 2 - user
Element 3 - agency
Element 4 - url
Element 5 - garbage
Element 6 - type (pass, fail, new...)
Element 7 - module
Element 8 - more garbage


05/Jun/2003:00:01:23 user1 agency1 url1 garbage pass mod1 more_garbage
05/Jun/2003:00:03:17 user2 null url1 garbage fail mod1 more_garbage
05/Jun/2003:00:03:42 user1 agency1 url1 garbage pass mod1 more_garbage
05/Jun/2003:00:05:03 user6 agnecy2 url1 garbage pass mod1 more_garbage
05/Jun/2003:00:08:34 user3 agnecy2 url1 garbage pass mod1 more_garbage
05/Jun/2003:00:11:59 user4 agency2 url2 garbage fail mod1 more_garbage
05/Jun/2003:00:14:30 user5 agnecy2 url1 garbage new_a mod1 more_garbage
05/Jun/2003:00:15:02 user1 agency1 url1 garbage pass mod1 more_garbage
05/Jun/2003:00:16:56 user7 agency2 url2 garbage pass mod2 more_garbage
05/Jun/2003:00:17:31 user1 agency1 url1 garbage fail mod1 more_garbage
05/Jun/2003:00:17:31 user1 agency1 url1 garbage pass mod2 more_garbage


There are six of these log files generated each day. What I need to get out of them is this:

Agency URL Pass Fail New Unique Module
agency1 url1 3 1 0 1 mod1
agnecy1 url1 1 0 0 1 mod2
agency2 url1 2 1 1 3 mod1
agency2 url2 1 0 0 1 mod2


and so on. As you can see from this example, a failure doesn't count towards the unique number - the unique is only the number of successfully logins. The pass count represents the total number of logins - thus caputuring reoccuring logins and so forth. I am sure that you get what I am getting at. Stats at how and what is being used.

Once I get these lines of stats, I can then throw the results into a database for more advanced querying and storage. This part I am good time go on. I am just having some trouble gathering the stats in the form that I want/need.

I have tried using nested hashes like
my (%agency,(%url,(%module,(%type,$type_count)))); my $pass_count=0; my $fail_count=0; my $new_count=0; while ($line=<FILE>) { ($e1,$e2,$e3,$e4,$e5,$e6,$e7,$e8)=(split/\s+/,$line); $pf=substr($pf,0,4); if ($pf eq "pass") { $agency{$e4,{$e7,{$e6,$pass_count++}}}; if ($pf eq "fail) { $agency{$e4,{$e7,{$e6,$fail_count++}}}; if ($pf eq "new_") { $agency{$e4,{$e7,{$e6,$new_count++}}};
This didn't seem to work. Part of this may be due to the fact that I can't figure out how to print the results, the other is that I am not convinced that the hashes are getting properly populated.

I then tried an object oriented way
package Agency; sub new { my $class={}; $class-> {agency}=undef; $class->{url}=undef; $class->{pass}=undef; $class->{fail}=undef; $class->{new}=undef; $class->{module}=undef; bless $class; return $class; } sub init { my $class=shift; $class->{agency}=shift; $class->{url}=shift; $class->{pass}=shift; $class->{fail}=shift; $class->{new}=shift; $class->{module}=shift; } sub display { my $class=shift; print "Agency: $class->{agency} URL: $class->{url} Pass: $class- +>{pass} Fail: $class{fail} New: $class{new} Module: $class->{modu +le}\n" } package main; my $new_agency=Agency->new(); foreach $file (glob('file_name*.txt')) { open (FILE,$file)||die ("unable to open $file\n"); print "working on $file\n"; while ($line=<FILE>) { ($ts,$w,$agency,$url,$x,$pf,$module,$z)=(split/\s+/,$line); $pf=substr($pf,0,4); if ($pf eq "pass") { $new_agency->init($agency,$url,$pass_count++,$fail_coun +t,$new_count,$module); } elsif ($pf eq "fail") { $new_agency->init($agency,$url,$pass_count,$fail_count+ ++,$new_count,$module); } elsif ($pf eq "new_") { $new_agency->init($agency,$url,$pass_count,$fail_count, +$new_count++,$module); } } #while } # foreach $new_agency->display();

This didn't work either, it just gave me the agency, and module information for the last line in the last file read - with one counter being properly incremented. I tried doing the display inside the loop but this didn't help. I am guessing the problem lies somewhere with the fact that I am using "init" in each case.

Despite the fact that everything else that I have tried has failed, I think that one of these two options seems the most legitimate. So I am now asking for both advice and opinions on what the best method for obtaining this result set would be.

In reply to nested hashes or object oriented by ctaustin

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.