in reply to Large file, multi dimensional hash - out of memory

Your input data set appears to be sorted. Is that intentional? If that's the case you're in luck. You only need to keep track of a single name at a time. Walk through the input file line by line. Each time the name changes, write the previous name to an output file along with an extension summary.


Dave

  • Comment on Re: Large file, multi dimensional hash - out of memory

Replies are listed 'Best First'.
Re^2: Large file, multi dimensional hash - out of memory
by Anonymous Monk on May 15, 2013 at 14:43 UTC
    great idea. yes the data is sorted. How can I dump that data to a file and free up the hash so I don't run out of memory?

      Here's an example of what I was suggesting.

      my $current_name = ''; my %extensions = (); while ( my $line = <DATA> ) { chomp $line; my( $name, $ext ) = split ' ', $line; if( $name ne $current_name ) { print "($current_name) => [", join( ', ', sort keys %extensions ), + "]\n" if length $current_name; $current_name = $name; %extensions = (); } $extensions{$ext}++; } print "($current_name) => [", join( ', ', sort keys %extensions ), "]\ +n"; __DATA__ /foo/bar/baz/123 aaa /foo/bar/baz/123 aab /foo/bar/baz/123 aac /foo/bar/baz/124 aaa /foo/bar/baz/124 aab

      The memory footprint will only be as big as the maximum number of extensions per name.


      Dave