in reply to puzzling seg fault
I agree with the others that you are running out of memory. I would note that the top level key "$month.year" appears redundant as it is a constant so you could loose that level although that should not make any significant difference. For the specific task you show (counting) you don't need a hash of hash of hash.....just to increment a counter. You could just stringify the key:
my $key = join '|', "$month.year", $fmt_proto, $fmt_dest_ip, $fmt_dest +_port, $fmt_src_ip, $fmt_src_port; $mon_log{$key} += $fmt_drp_packets;
This removes all those expensive levels of keys but still gets you your count. You can breakdown the key with split as required. This may use less memory even though the keys are longer and contain redundant data. Memory consumption will depend on how many keys you end up with. It is a crappy way to do it compared to a database. It looks like tab sep data (or it could be) so you could do in MySQL something like:
create table stuff ( proto char(4), src_ip char(15), etc.... drp_packets int, index(proto), index(src_ip), etc.... ) load data local infile '/blah/blah.dat' into table stuff select sum(packets) where src_ip = 1.2.3.4 and .....
You can then make any queries you want.....
What you are actually doing could be handled in a different way. If you sort the input file (unix sort will handle it fine and do it fastest) then you can simply iterate over to generate your counts using a line merge strategy every time you find a new proto/src/dest/port combo:
my $current_rec = ''; my $current_count = 0; my($proto, $dest_ip, $src_ip, $dest_port, $src_port, $drp_packets, $co +untry, $rec); while(<REPORTFILE>){ ($proto, $dest_ip, $src_ip, $dest_port, $src_port, $drp_packets, $ +country) = split' '; $rec = join "\t", $proto, $dest_ip, $src_ip, $dest_port, $src_port +; if ( $rec eq $current_rec ) { $current_count += $drp_packets; } else { print OUTFILE $current_rec, "\t", $current_count; $current_rec = $rec; $current_count = $drp_packets; } } # now print any hanging rec print OUTFILE $current_rec, "\t", $current_count if $rec eq $current_r +ec;
This will also be probably an order of magnitude or two faster than using a hash. You can do a sort -nrk6 outfile > drop_sort to sort by dropped packets. BTW your split will split 'United States' into two tokens. use split ' ', $_, 7 to get what you expect.
cheers
tachyon
|
|---|