in reply to Re^2: Reduce RAM required
in thread Reduce RAM required
Changed to use '>'.
It counts A/T/G/C for each "window", not each sequence. That seemed to be what your program was doing.
Its random generation over a "window" is equivalent to the shuffle over a "window" you were doing.
Uses file handles.
#!/usr/bin/perl # https://perlmonks.org/?node_id=1228191 use strict; use warnings; my $window = 1e6; my $A = my $C = my $G = my $all = 0; my (@sizes, $tmp, $start); my $inputfile = 'd.1228191'; my $outputfile = 'd.out.1228191'; open my $in, '<', $inputfile or die "$! opening $inputfile"; open my $out, '>', $outputfile or die "$! opening $outputfile"; sub letter { my $n = int rand $all--; $n < $A ? ($A--, return 'a') : $n < $A + $C ? ($C--, return 'c') : $n < $A + $C + $G ? ($G--, return 'g') : return 't'; } sub output { for my $count ( @sizes ) { print $out ">ID", $start++, "\n", map(letter(), 1 .. $count), "\n" +; } @sizes = (); } while( <$in> ) { if( /^>/ ) { $start //= s/\D+//gr; } elsif( /^[acgt]/ ) { $A += tr/a//; $C += tr/c//; $G += tr/g//; $all += $tmp = tr/acgt//; push @sizes, $tmp; $all >= $window and output(); } } $all and output(); close $in; close $out;
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Reduce RAM required
by onlyIDleft (Scribe) on Jan 09, 2019 at 19:05 UTC | |
by tybalt89 (Monsignor) on Jan 09, 2019 at 19:16 UTC | |
by onlyIDleft (Scribe) on Jan 09, 2019 at 20:00 UTC | |
by tybalt89 (Monsignor) on Jan 09, 2019 at 20:24 UTC | |
by onlyIDleft (Scribe) on Jan 09, 2019 at 21:18 UTC | |
|