in reply to Re^2: Sorting Gigabytes of Strings Without Storing Them
in thread Sorting Gigabytes of Strings Without Storing Them
Just off the top of my head, I'd probably try replacing the 'pack' stuff by producing whole bytes at a time. Something like:
use Algorithm::Loops qw< NestedLoops >; my @bases= qw< A C G T >; my @quad= map { join '', @$_ } NestedLoops( [ ( \@bases ) x 4 ] ); my %byte; @byte{ @quad }= map pack("C",$_), 0 .. %#quad; my $carry= ''; while( <> ) { chomp; substr( $_, 0, 0, $carry ); my $pack= ''; s/(....)/ $pack .= $byte{$1}; '' /g; $carry= $_; print RAM $pack; } print RAM $byte{ substr( $carry . 'AAA', 0, 4 ) } if $carry;
But I haven't looked at the rest of this thread recently nor actually tried my suggestions. I can think of lots of different ways to pull out 4 bases at a time and some ways might have a noticeable impact on speed.
- tye
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Sorting Gigabytes of Strings Without Storing Them (bytes)
by BrowserUk (Patriarch) on Dec 24, 2008 at 10:51 UTC |