in reply to Re: Code Optimization
in thread Code Optimization

Originally, I had the program open the file in the manner that you have coded (Once only) however the amount of data in question is massive. This caused the program to use up all the memory of the server I am using, and the OS would randomly kill the program in favor of OS processes. I, perhaps incorrectly, favored the line by line approach because on runtime, the program only uses 0.3% of the RAM, instead of 100%. Perhaps it might be worthwhile to break up the files into chunks and read in maybe 100,000 or so lines at a time?

Replies are listed 'Best First'.
Re^3: Code Optimization
by kcott (Archbishop) on Sep 10, 2013 at 10:28 UTC

    In that case, I'd suggest using Tie::File:

    use Tie::File; ... tie my @seq_data, 'Tie::File', $sequence_fname or die "Can't open $sequence_fname: $!"; if(!($permute)){ for (@seq_data) { my @line = split /\t/; ... } ... { else{ open(OUT,">>$out")||die "Cannot open $out\n"; for (...) { ... for (@seq_data) { my @line = split /\t/; ... } foreach my $key(keys %ktc){ ... print OUT ... ... } } close OUT; } untie @seq_data;

    That'll be a little slower because you'll be repeating the split /\t/, but at least you won't have memory issues.

    Also, note the change I made to die. This is not for optimising the efficiency of your code; it's to improve feedback if things go wrong. When you terminate the die message with a newline, you prevent file and line information from being output. Also, "$!" provides addition information about why open failed, see "perlvar: Error Variables". This is a good practice to get into the habit of doing; alternatively, consider using autodie.

    -- Ken

      Thanks all, after implementing Ken's Tie::File suggestion, the code runs much much faster. I learned something and I appreciate everyone's comments