Re^2: Code Optimization

Originally, I had the program open the file in the manner that you have coded (Once only) however the amount of data in question is massive. This caused the program to use up all the memory of the server I am using, and the OS would randomly kill the program in favor of OS processes. I, perhaps incorrectly, favored the line by line approach because on runtime, the program only uses 0.3% of the RAM, instead of 100%. Perhaps it might be worthwhile to break up the files into chunks and read in maybe 100,000 or so lines at a time?

Comment on Re^2: Code Optimization

Replies are listed 'Best First'.
Re^3: Code Optimization by kcott (Archbishop) on Sep 10, 2013 at 10:28 UTC
In that case, I'd suggest using Tie::File: `use Tie::File; ... tie my @seq_data, 'Tie::File', $sequence_fname or die "Can't open $sequence_fname: $!"; if(!($permute)){ for (@seq_data) { my @line = split /\t/; ... } ... { else{ open(OUT,">>$out")\|\|die "Cannot open $out\n"; for (...) { ... for (@seq_data) { my @line = split /\t/; ... } foreach my $key(keys %ktc){ ... print OUT ... ... } } close OUT; } untie @seq_data;` [download] That'll be a little slower because you'll be repeating the `split /\t/`, but at least you won't have memory issues. Also, note the change I made to die. This is not for optimising the efficiency of your code; it's to improve feedback if things go wrong. When you terminate the `die` message with a newline, you prevent file and line information from being output. Also, "`$!`" provides addition information about why `open` failed, see "perlvar: Error Variables". This is a good practice to get into the habit of doing; alternatively, consider using autodie. -- Ken	[reply] [d/l] [select]
Re^4: Code Optimization by azheid (Sexton) on Sep 12, 2013 at 19:05 UTC
Thanks all, after implementing Ken's Tie::File suggestion, the code runs much much faster. I learned something and I appreciate everyone's comments	[reply]