comment on

I agree with the others that you are running out of memory. I would note that the top level key "$month.year" appears redundant as it is a constant so you could loose that level although that should not make any significant difference. For the specific task you show (counting) you don't need a hash of hash of hash.....just to increment a counter. You could just stringify the key:

my $key = join '|', "$month.year", $fmt_proto, $fmt_dest_ip, $fmt_dest
+_port, $fmt_src_ip, $fmt_src_port;
$mon_log{$key} += $fmt_drp_packets;
[download]

This removes all those expensive levels of keys but still gets you your count. You can breakdown the key with split as required. This may use less memory even though the keys are longer and contain redundant data. Memory consumption will depend on how many keys you end up with. It is a crappy way to do it compared to a database. It looks like tab sep data (or it could be) so you could do in MySQL something like:

create table stuff (
    proto char(4),
    src_ip char(15),
    etc....
    drp_packets int,
    index(proto),
    index(src_ip),
    etc....
)

load data local infile '/blah/blah.dat' into table stuff

select sum(packets) where src_ip = 1.2.3.4 and .....
[download]

You can then make any queries you want.....

What you are actually doing could be handled in a different way. If you sort the input file (unix sort will handle it fine and do it fastest) then you can simply iterate over to generate your counts using a line merge strategy every time you find a new proto/src/dest/port combo:

my $current_rec = '';
my $current_count = 0;
my($proto, $dest_ip, $src_ip, $dest_port, $src_port, $drp_packets, $co
+untry, $rec);

while(<REPORTFILE>){
    ($proto, $dest_ip, $src_ip, $dest_port, $src_port, $drp_packets, $
+country) = split' ';
    $rec = join "\t", $proto, $dest_ip, $src_ip, $dest_port, $src_port
+;
    if ( $rec eq $current_rec ) {
        $current_count += $drp_packets;
    }
    else {
        print OUTFILE $current_rec, "\t", $current_count;
        $current_rec = $rec;
        $current_count = $drp_packets;
    }
}

# now print any hanging rec
print OUTFILE $current_rec, "\t", $current_count if $rec eq $current_r
+ec;
[download]

This will also be probably an order of magnitude or two faster than using a hash. You can do a sort -nrk6 outfile > drop_sort to sort by dropped packets. BTW your split will split 'United States' into two tokens. use split ' ', $_, 7 to get what you expect.

cheers

tachyon

In reply to Re: puzzling seg fault by tachyon
in thread puzzling seg fault by ttown1079

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.