in reply to merge multiple files giving out of memory error

You only seem to be using the number of values and their sum, so you don't have to keep all the values in your arrays. Thanks to the magic of autovification, you can just write:

# at the top of your program use constant { Key => 0, Sum => 1, Count => 2 }; # How you fill the %seen hash $seen{$key1}[Key] //= $key; ### Edit, turned = $key into //= $key $seen{$key1}[Sum] += $value; # Cumulated value of $key1 $seen{$key1}[Count] ++; # Number of time $key1 has been seen

If that's still not enough, you can export the memory consumption elsewhere (eg: on your hard drive) by using a database tied to your hash; DBM::Deep seems like a good candidate for that, although I have never used it. This won't make your program any faster though.

THe code is working perfectly
Well now that's a mystery, because while (<>) shifts (removes) the file names from @ARGV before opening the files, so @ARGV has to be empty by the time you try to get the file count. my $file_count = @ARGV; should be at the top of the file (and can be done only once). By the way, your array contains the key and all the values, so even if you have just one value, the array would be of size 2.

About the hanging part, that's to be expected when there's a lot of data to process. You can add a message to tell you how far the processing has gone (and know if it is actually frozen or just not done yet). print "Done processing $ARGV\n" if eof; (at the end of the first loop) will print a message each time the end of a file is reached (see eof).

Replies are listed 'Best First'.
Re^2: merge multiple files giving out of memory error
by Anonymous Monk on Feb 27, 2017 at 11:34 UTC
    Thank you for the help. Modified my code as:
    my %seen; $/ = ""; while (<>) { chomp; my ($key, $value) = split ('\t', $_); my @lines = split /\n/, $key; my $key1 = $lines[1]; $seen{$key1}[Key] //= $key; $seen{$key1}[Sum] += $value; } my $file_count = @ARGV; foreach my $key1 ( keys %seen ) { if ( @{ $seen{$key1} } >= $file_count) { print join( "\t", @{$seen{$key1}}); print "\n\n"; } }
    but please help me also to have the name of the files in which a particular read exists. I mean with the total count it also tells me in which files it is present.

      my $file_count = @ARGV; foreach my $key1 ( keys %seen ) { if ( @{ $seen{$key1} } >= $file_count) { print join( "\t", @{$seen{$key1}}); print "\n\n"; } }
      This still doesn't make sense. If you add print "File count is: $file_count \n"; You'll find that $file_count is always 0, because after reading the files with while (<>), @ARGV is always empty. And you check the size of the array in $seen{$key1}, but it always is 2 (there are two elements, Key, and Sum).

      When you use while (<>) to read from a list of files, the current file is $ARGV.

      # At the top of the file use constant { Key => 0, Sum => 1, Count => 2, # Remove this if you don't use the total count Files => 3 # Should be 2 if Count is not used. };
      # In the read loop $seen{$key1}[Key] //= $key; $seen{$key1}[Sum] += $value; $seen{$key1}[Count]++; # Total count for the number of times t +his value exists $seen{$key1}[Files]{$ARGV}++; # Count in this file

      You don't seem to want a particular format for your output (because you changed it when adapting my proposition), so you could try just dumping the whole structure using either Data::Dumper (nothing to install) or YAML (needs to be installed, but can be nicer to read).

      use Data::Dumper; while (<>) { # Your code here } print Dumper(\%seen);
      Or
      use YAML; while (<>) { # Your code here } print YAML::Dump(\%seen);

        Thank you for the help. I am sorry I didn't mention it earlier, I want the name of files also with the count in each. Can you please help me in this also. very sorry if this irritates you.