Help with hash of arrays from file

jb60606 has asked for the wisdom of the Perl Monks concerning the following question:

I'm relatively new to Perl and am having problems creating a hash of arrays. The file, about 500MB in size, is formatted like so:

2011-04-03 09:37:12.129 (INFO, ICELineHandler.cpp:339) Product Def Mar
+ketID 90120253, Symbol 'BRN FMU0012_OMCA0000118502081312'
<b>2011-04-09 21:32:15.525 [3509,3523]: Gap detect on 233.156.208.41:2
+0041 from 2746318 to 2746373, moving to next message
2011-04-09 21:32:15.585 [3509,3523]: Gap detect on 233.156.208.41:2004
+1 from 2746420 to 2746475, moving to next message
2011-04-09 21:32:15.639 [3509,3522]: Received data on Connection[ICE-O
+ptions]. Pending=214044. </b>
2011-04-03 09:37:12.129 (INFO, ICELineHandler.cpp:339) Product Def Mar
+ketID 90120253, Symbol 'BRN FMU0012_OMCA0000118502081312'
[download]

Using the timestamp in the second column as the key, I need to search for any line containing a "Pending=" queue greater than a user specified amount, and add it to an array. Also, if there was a "Gap" detected at the same second (I'll be stripping the milliseconds), add them to the array for that same time stamp. In the text above, each key/array pair to read like:

my %ICE = (
    21:31:10 => [ 'Pending=3201', ],
    21:32:14 => [ 'Pending=1000', ],
    21:32:15 => [ 'Gap detect', '233.156.208.41:20041', 'Pending=21404
+4',],
    21:32:24 => [ 'Gap detect', '233.156.208.41:20041', 'Pending=10400
+0',],
    21:32:58 => [ 'Gap detect', '233.156.208.41:20041' 'Pending=96000'
+,],
    21:33:12 => [ 'Pending=528', ]
);

And print comma separated:

21:31:10, Gap detect 233.156.208.41:20041, Pending=
21:31:12, Gap detect 233.156.208.41:20041, Pending=3400
[download]

Thus far, I have the following code, but it doesn't seem to work. Any recommendations?

UPDATE:
Apologies for the "half-arsed" description and lack of presentation of the results I'm getting. I wrote this in a hurry, thinking I could update it when I got home. Being new here, I didn't expect so many quick responses. I'll update the post shortly. Thanks for your help


# Get today's date which will be used as the default.
my($day, $month, $year) = (localtime)[3,4,5];
$month = sprintf '%02d', $month+1;
$day   = sprintf '%02d', $day;
$year = $year+1900;
$ymd = "$year-$month-$day";

my $total; # variable to be used for queue total for the given timefra
+me
my $count = 0;

## Get command line arguments, convert time to seconds
##my $logFile = "pmmd-ltc-fsrlabs21-mdrc-server-cta.log";
my $logFile = "pmmd-ltc-fsrlabs41-mdrc-server-ice.log";
my $sTime= $ARGV[0];
my @sTime=split(/:/,$sTime); # split start time
my $sSecs=$sTime[0] * 3600 + $sTime[1] * 60 + $sTime[2]; # convert sta
+rt-time to seconds
my $eTime = $ARGV[1];
my @eTime=split(/:/,$eTime); # split end time
my $eSecs=$eTime[0] * 3600 + $eTime[1] * 60 + $eTime[2]; # convert sto
+p-time to seconds
my $tHold = $ARGV[2];

open(LOG, "$logFile") or die "Couldn't open file for processing: $!";

while ( $line = <LOG> ) {
unless (($data[10] =~ m/Pending=/) || ($data[6] =~ m/Gap/)) { next; } 
+# skip elements we don't want
     if ($line =~ m/Pending=/) {
        my @data=split(/ /,$line); # split the line up
        $data[10] =~ s/[A-Za-z=.]//g; # delete "Pending", "=" and "."
        $data[1] =~ s/\..*//g; # delete the millisecond element of the
+ (line)lTime var
        my @lTime=split(/:/,$data[1]); # split the line time
        my $lSecs=$lTime[0] * 3600 + $lTime[1] * 60 + $lTime[2]; # con
+vert line-time to seconds

        if (($data[0] eq $ymd) && ($data[10] >= $tHold) && ($lSecs >= 
+$sSecs) && ($lSecs <= $eSecs))
            {
                   $line = "$data[1],$data[10]";
                }
   }

      elsif ($line =~ m/Gap detect/) {
        my @data=split(/ /,$line); # split the line up
        $data[1] =~ s/\..*//g; # delete the millisecond element of the
+ (line)lTime var
        my @lTime=split(/:/,$data[1]); # split the line time
        my $lSecs=$lTime[0] * 3600 + $lTime[1] * 60 + $lTime[2]; # con
+vert line-time to seconds
                if (($data[0] eq $ymd) && ($lSecs >= $sSecs) && ($lSec
+s <= $eSecs))
                {
                        $line = "$data[1],$data[9]";
                }
   } else { next; }


   ($time, $rest) = split ',', $line, 2;
   #$time =~ s/\..*//g; # delete the millisecond element
   @fields = split ',', $rest;
   $HoA{$time} = [ @fields ];
}

for $time (sort (keys (%HoA)) ) {
   print "$time: @{ $HoA{$time} }\n";
}

close LOG;
[download]

Comment on Help with hash of arrays from file Select or Download Code

Replies are listed 'Best First'.
Re: Help with hash of arrays from file by ww (Archbishop) on Apr 10, 2011 at 18:01 UTC
"Any recommendations?" Recommend that -- at a minimum -- you tell us HOW "it doesn't seem to work." That will make making recommendations easier. As it is, some of us (assuredly including \me) would have to download your data snippet and your code and execute it to ascertain what you probably already know. You probably don't want to make the price of offering assistance that high; unnecessarily high! And BTW, if the phrase "doesn't work" means "throws errors or warnings" you need to mention what they are... and if the phrase means "dies silently" then you need `strict` and `warnings` and either debug or some debugging-print-statements in your code. HTH	[reply] [d/l] [select]
Re: Help with hash of arrays from file by GrandFather (Saint) on Apr 10, 2011 at 23:29 UTC
The information you provide about your input data and the sample data structure you show are inconsistent so the following sample code may not print what you want, but the parsing should at least get you headed in a useful direction: #!/usr/bin/perl use strict; use warnings; use 5.010; my %recs; while (<DATA>) { my ($time, $mode, $value) = /(\d\d:\d\d:\d\d)\.\d{3} .*? (?: (Pending\|Gap) (?:\sdetect\son\s \| =)? ([\d.:]+) ) /x; next if !defined $time \|\| !defined $mode; if ($mode eq 'Pending') { $recs{$time}{Pending} = $value; next; } ++$recs{$time}{Gaps}{$value}; } for my $rec (map {$recs{$_}} sort keys %recs) { my $pend = "Pending=$rec->{Pending}\n"; print join $pend, map {"Gap detect $_ "} sort keys %{$rec->{Gaps}} if exists $rec->{Gaps}; print $pend; } __DATA__ 2011-04-03 09:37:12.129 (INFO, ICELineHandler.cpp:339) Product Def Mar +ketID 90120253, Symbol 'BRN FMU0012_OMCA0000118502081312' 2011-04-09 21:32:15.525 3509,3523: Gap detect on 233.156.208.41:20041 +from 2746318 to 2746373, moving to next message 2011-04-09 21:32:15.585 3509,3523: Gap detect on 233.156.208.41:20041 +from 2746420 to 2746475, moving to next message 2011-04-09 21:32:15.639 3509,3522: Received data on ConnectionICE-Opti +ons. Pending=214044. 2011-04-03 09:37:12.129 (INFO, ICELineHandler.cpp:339) Product Def Mar +ketID 90120253, Symbol 'BRN FMU0012_OMCA0000118502081312' [download] Prints: `Gap detect 233.156.208.41:20041 Pending=214044.` [download] True laziness is hard work	[reply] [d/l] [select]
Re: Help with hash of arrays from file by luis.roca (Deacon) on Apr 10, 2011 at 23:21 UTC
Hello, First, I would follow ww's suggestion and add `use strict; use warnings;` (and even `use diagnostics; use Data::Dumper` for a more verbose explanation of any errors and debugging the return value of your variables.) I would also suggest taking a look at the Perl Data Structures Cookbook: perldsc. As far as the example of data you provided: 2011-04-03 09:37:12.129 (INFO, ICELineHandler.cpp:339) Product Def Mar +ketID 90120253, Symbol 'BRN FMU0012_OMCA0000118502081312' 2011-04-09 21:32:15.525 3509,3523: Gap detect on 233.156.208.41:20041 +from 2746318 to 2746373, moving to next message 2011-04-09 21:32:15.585 3509,3523: Gap detect on 233.156.208.41:20041 +from 2746420 to 2746475, moving to next message 2011-04-09 21:32:15.639 3509,3522: Received data on ConnectionICE-Opti +ons. Pending=214044. 2011-04-03 09:37:12.129 (INFO, ICELineHandler.cpp:339) Product Def Mar +ketID 90120253, Symbol 'BRN FMU0012_OMCA0000118502081312' [download] It doesn't seem as if you're showing us an example line that would meet the requirements you describe (If I'm understanding them correctly). Are you looking to ultimately have something like this: Read more... (4 kB) diagnostics, Data::Dumper Good luck! "...the adversities born of well-placed thoughts should be considered mercies rather than misfortunes." — Don Quixote	[reply] [d/l] [select]
Re: Help with hash of arrays from file by toolic (Bishop) on Apr 11, 2011 at 01:19 UTC
Unrelated to your problem... You can simplify the following: `my($day, $month, $year) = (localtime)[3,4,5]; $month = sprintf '%02d', $month+1; $day = sprintf '%02d', $day; $year = $year+1900; $ymd = "$year-$month-$day";` [download] using the Core POSIX module: `use POSIX qw(strftime); my $ymd = strftime('%Y-%m-%d', localtime);` [download] Please change your "pre" tags to "code" tags.	[reply] [d/l] [select]
Re: Help with hash of arrays from file by Khariton (Sexton) on Apr 10, 2011 at 18:56 UTC
1.If I need big data hash - I must have RAM to this data. may be in this trouble? 2.when I need array oh hashes I use this scheme: my %data, %data2, %data3; $data{$timestamp}=$data; $data2{$timestamp}=$data2; $data3{$timestamp}=$data3; All various data's collected in own database(hash) by $timestamp	[reply]
Re^2: Help with hash of arrays from file by GrandFather (Saint) on Apr 10, 2011 at 20:46 UTC
Your "AOH" scheme apears here in various guises time and time again form people who have never programmed before and is immediately jumped on by wiser heads who suggest this alternate scheme: `my @records; while (<$inData) { my ($name, $age) = split ','; push @records, {name => $name, age => $age}; }` [download] However, what the OP was asking about was a hash of arrays (HOA) so your scheme doesn't even help the OP. True laziness is hard work	[reply] [d/l]