jb60606 has asked for the wisdom of the Perl Monks concerning the following question:

I'm relatively new to Perl and am having problems creating a hash of arrays. The file, about 500MB in size, is formatted like so:
2011-04-03 09:37:12.129 (INFO, ICELineHandler.cpp:339) Product Def Mar +ketID 90120253, Symbol 'BRN FMU0012_OMCA0000118502081312' <b>2011-04-09 21:32:15.525 [3509,3523]: Gap detect on 233.156.208.41:2 +0041 from 2746318 to 2746373, moving to next message 2011-04-09 21:32:15.585 [3509,3523]: Gap detect on 233.156.208.41:2004 +1 from 2746420 to 2746475, moving to next message 2011-04-09 21:32:15.639 [3509,3522]: Received data on Connection[ICE-O +ptions]. Pending=214044. </b> 2011-04-03 09:37:12.129 (INFO, ICELineHandler.cpp:339) Product Def Mar +ketID 90120253, Symbol 'BRN FMU0012_OMCA0000118502081312'
Using the timestamp in the second column as the key, I need to search for any line containing a "Pending=" queue greater than a user specified amount, and add it to an array. Also, if there was a "Gap" detected at the same second (I'll be stripping the milliseconds), add them to the array for that same time stamp. In the text above, each key/array pair to read like:
my %ICE = ( 21:31:10 => [ 'Pending=3201', ], 21:32:14 => [ 'Pending=1000', ], 21:32:15 => [ 'Gap detect', '233.156.208.41:20041', 'Pending=21404 +4',], 21:32:24 => [ 'Gap detect', '233.156.208.41:20041', 'Pending=10400 +0',], 21:32:58 => [ 'Gap detect', '233.156.208.41:20041' 'Pending=96000' +,], 21:33:12 => [ 'Pending=528', ] ); And print comma separated: 21:31:10, Gap detect 233.156.208.41:20041, Pending= 21:31:12, Gap detect 233.156.208.41:20041, Pending=3400
Thus far, I have the following code, but it doesn't seem to work. Any recommendations?
UPDATE:
Apologies for the "half-arsed" description and lack of presentation of the results I'm getting. I wrote this in a hurry, thinking I could update it when I got home. Being new here, I didn't expect so many quick responses. I'll update the post shortly. Thanks for your help
# Get today's date which will be used as the default. my($day, $month, $year) = (localtime)[3,4,5]; $month = sprintf '%02d', $month+1; $day = sprintf '%02d', $day; $year = $year+1900; $ymd = "$year-$month-$day"; my $total; # variable to be used for queue total for the given timefra +me my $count = 0; ## Get command line arguments, convert time to seconds ##my $logFile = "pmmd-ltc-fsrlabs21-mdrc-server-cta.log"; my $logFile = "pmmd-ltc-fsrlabs41-mdrc-server-ice.log"; my $sTime= $ARGV[0]; my @sTime=split(/:/,$sTime); # split start time my $sSecs=$sTime[0] * 3600 + $sTime[1] * 60 + $sTime[2]; # convert sta +rt-time to seconds my $eTime = $ARGV[1]; my @eTime=split(/:/,$eTime); # split end time my $eSecs=$eTime[0] * 3600 + $eTime[1] * 60 + $eTime[2]; # convert sto +p-time to seconds my $tHold = $ARGV[2]; open(LOG, "$logFile") or die "Couldn't open file for processing: $!"; while ( $line = <LOG> ) { unless (($data[10] =~ m/Pending=/) || ($data[6] =~ m/Gap/)) { next; } +# skip elements we don't want if ($line =~ m/Pending=/) { my @data=split(/ /,$line); # split the line up $data[10] =~ s/[A-Za-z=.]//g; # delete "Pending", "=" and "." $data[1] =~ s/\..*//g; # delete the millisecond element of the + (line)lTime var my @lTime=split(/:/,$data[1]); # split the line time my $lSecs=$lTime[0] * 3600 + $lTime[1] * 60 + $lTime[2]; # con +vert line-time to seconds if (($data[0] eq $ymd) && ($data[10] >= $tHold) && ($lSecs >= +$sSecs) && ($lSecs <= $eSecs)) { $line = "$data[1],$data[10]"; } } elsif ($line =~ m/Gap detect/) { my @data=split(/ /,$line); # split the line up $data[1] =~ s/\..*//g; # delete the millisecond element of the + (line)lTime var my @lTime=split(/:/,$data[1]); # split the line time my $lSecs=$lTime[0] * 3600 + $lTime[1] * 60 + $lTime[2]; # con +vert line-time to seconds if (($data[0] eq $ymd) && ($lSecs >= $sSecs) && ($lSec +s <= $eSecs)) { $line = "$data[1],$data[9]"; } } else { next; } ($time, $rest) = split ',', $line, 2; #$time =~ s/\..*//g; # delete the millisecond element @fields = split ',', $rest; $HoA{$time} = [ @fields ]; } for $time (sort (keys (%HoA)) ) { print "$time: @{ $HoA{$time} }\n"; } close LOG;

Replies are listed 'Best First'.
Re: Help with hash of arrays from file
by ww (Archbishop) on Apr 10, 2011 at 18:01 UTC
    "Any recommendations?"

    Recommend that -- at a minimum -- you tell us *HOW* "it doesn't seem to work." That will make making recommendations easier. As it is, some of us (assuredly including \me) would have to download your data snippet and your code and execute it to ascertain what you probably already know.

    You probably don't want to make the price of offering assistance that high; unnecessarily high!

    And BTW, if the phrase "doesn't work" means "throws errors or warnings" you need to mention what they are... and if the phrase means "dies silently" then you need strict and warnings and either debug or some debugging-print-statements in your code.

    HTH
Re: Help with hash of arrays from file
by GrandFather (Saint) on Apr 10, 2011 at 23:29 UTC

    The information you provide about your input data and the sample data structure you show are inconsistent so the following sample code may not print what you want, but the parsing should at least get you headed in a useful direction:

    #!/usr/bin/perl use strict; use warnings; use 5.010; my %recs; while (<DATA>) { my ($time, $mode, $value) = /(\d\d:\d\d:\d\d)\.\d{3} .*? (?: (Pending|Gap) (?:\sdetect\son\s | =)? ([\d.:]+) ) /x; next if !defined $time || !defined $mode; if ($mode eq 'Pending') { $recs{$time}{Pending} = $value; next; } ++$recs{$time}{Gaps}{$value}; } for my $rec (map {$recs{$_}} sort keys %recs) { my $pend = "Pending=$rec->{Pending}\n"; print join $pend, map {"Gap detect $_ "} sort keys %{$rec->{Gaps}} if exists $rec->{Gaps}; print $pend; } __DATA__ 2011-04-03 09:37:12.129 (INFO, ICELineHandler.cpp:339) Product Def Mar +ketID 90120253, Symbol 'BRN FMU0012_OMCA0000118502081312' 2011-04-09 21:32:15.525 3509,3523: Gap detect on 233.156.208.41:20041 +from 2746318 to 2746373, moving to next message 2011-04-09 21:32:15.585 3509,3523: Gap detect on 233.156.208.41:20041 +from 2746420 to 2746475, moving to next message 2011-04-09 21:32:15.639 3509,3522: Received data on ConnectionICE-Opti +ons. Pending=214044. 2011-04-03 09:37:12.129 (INFO, ICELineHandler.cpp:339) Product Def Mar +ketID 90120253, Symbol 'BRN FMU0012_OMCA0000118502081312'

    Prints:

    Gap detect 233.156.208.41:20041 Pending=214044.
    True laziness is hard work
Re: Help with hash of arrays from file
by luis.roca (Deacon) on Apr 10, 2011 at 23:21 UTC

    Hello,
    First, I would follow ww's suggestion and add use strict; use warnings; *(and even use diagnostics; use Data::Dumper for a more verbose explanation of any errors and debugging the return value of your variables.)

    I would also suggest taking a look at the Perl Data Structures Cookbook: perldsc.

    As far as the example of data you provided:

    2011-04-03 09:37:12.129 (INFO, ICELineHandler.cpp:339) Product Def Mar +ketID 90120253, Symbol 'BRN FMU0012_OMCA0000118502081312' 2011-04-09 21:32:15.525 3509,3523: Gap detect on 233.156.208.41:20041 +from 2746318 to 2746373, moving to next message 2011-04-09 21:32:15.585 3509,3523: Gap detect on 233.156.208.41:20041 +from 2746420 to 2746475, moving to next message 2011-04-09 21:32:15.639 3509,3522: Received data on ConnectionICE-Opti +ons. Pending=214044. 2011-04-03 09:37:12.129 (INFO, ICELineHandler.cpp:339) Product Def Mar +ketID 90120253, Symbol 'BRN FMU0012_OMCA0000118502081312'

    It doesn't seem as if you're showing us an example line that would meet the requirements you describe (If I'm understanding them correctly). Are you looking to ultimately have something like this:

     * diagnostics, Data::Dumper

    Good luck!


    "...the adversities born of well-placed thoughts should be considered mercies rather than misfortunes." — Don Quixote
Re: Help with hash of arrays from file
by toolic (Bishop) on Apr 11, 2011 at 01:19 UTC
    Unrelated to your problem...

    You can simplify the following:

    my($day, $month, $year) = (localtime)[3,4,5]; $month = sprintf '%02d', $month+1; $day = sprintf '%02d', $day; $year = $year+1900; $ymd = "$year-$month-$day";

    using the Core POSIX module:

    use POSIX qw(strftime); my $ymd = strftime('%Y-%m-%d', localtime);

    Please change your "pre" tags to "code" tags.

Re: Help with hash of arrays from file
by Khariton (Sexton) on Apr 10, 2011 at 18:56 UTC
    1.If I need big data hash - I must have RAM to this data. may be in this trouble?
    2.when I need array oh hashes I use this scheme:

    my %data, %data2, %data3;
    $data{$timestamp}=$data;
    $data2{$timestamp}=$data2;
    $data3{$timestamp}=$data3;

    All various data's collected in own database(hash) by $timestamp

      Your "AOH" scheme apears here in various guises time and time again form people who have never programmed before and is immediately jumped on by wiser heads who suggest this alternate scheme:

      my @records; while (<$inData) { my ($name, $age) = split ','; push @records, {name => $name, age => $age}; }

      However, what the OP was asking about was a hash of arrays (HOA) so your scheme doesn't even help the OP.

      True laziness is hard work