Re: puzzling seg fault

It sounds like you are running out of memory.

A 1_000_000 key hash with a reference to an empty hash as the value requires around 160 MB. For 8_000_000 keys, that would be 1.2 GB. Even with duplicates at each level, that is still a lot of hashes.

How much memory do you have in your machine? Have you monitored the memory consuption as the program runs?

Examine what is said, not who speaks.

"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail
"Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon

Comment on Re: puzzling seg fault

Replies are listed 'Best First'.
Re^2: puzzling seg fault by ttown1079 (Initiate) on Jun 30, 2004 at 19:51 UTC
The machine is an SGI Origin 16 processor HPC with 1 gig ram per processor. When running job accounting, nothing of note seems to happen. I understand a database is desired, and I intend to go that route eventually, but I would like to have both options - database and text files manipulation.	[reply]
Re^3: puzzling seg fault by BrowserUk (Patriarch) on Jun 30, 2004 at 20:40 UTC
Nice hardware, but ... 1 GB/Processor? I know nothing of that hardware, but that (again) sounds like any given process will be limited to 1 GB minus any OS overhead. It very much depends upon the distribution of the contents of the file, but I could quite see the 8 million lines building a hash structure > 1 GB. Most times when Perl run's out of ram on my machine I get Perl's "Out of memory" error, but occasionally I get a segfault. Maybe your job accounting would be telling you if memory was a problem--I've not the vaguest clue what that might contain--and you can rule it out, but if you have access to a top-like live monitoring program, it would be worth checking it out. I think I would try filtering the input file into smaller files, say by protocol (assuming there not all icmp) and then process those separately and combine the results. You don't actually show what your doing with that monster structure. It looks like your just counting the number of dropped packets per date/proto/dst:port/src:port. If that is all your are doing, then there little point in building the deep structure. You would achieve the same results by concatenating all those values into a string and using a single level of hash. That said, if the problem is memory, that may not help much. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon	[reply]