Re^6: Reduce RAM required

head -n1 Ath_orig.fa 
>Chr1
[download]

Doesn't appear to have any leading whitespace

About the print statement, even if I comment out the first one, the output file is still empty

Conversely, if I comment out the second one, I do not see the output on screen as STDOUT

If you would like to replicate the same behavior with my input file, you can download it from here -

https://www.filehosting.org/file/details/774814/Ath_orig.fa

Thanks a lot!

Comment on Re^6: Reduce RAM required Download Code

Replies are listed 'Best First'.
Re^7: Reduce RAM required by tybalt89 (Monsignor) on Jan 09, 2019 at 20:24 UTC
Just post about 20 lines here, enclosed in a code block, instead of using a posting service that requires an email address. Try adding an error message as an "else" part to the "if" test to see if there are any invalid lines in the input file.	[reply]
Re^8: Reduce RAM required by onlyIDleft (Scribe) on Jan 09, 2019 at 21:18 UTC
No leading whitespaces anywhere in input file. I just opened it with a text editor and checked But I just remembered that real life DNA sequences also often have N in addition to A/T/G/C. these have to be accounted for as well, right? However, in my input sequences, having Ns versus NOT having them did not appear to make a difference in my test runs using the following input example `>Chr1 CCCTAAACCCTAAACCCTAAACCCTAAACCTCTGAATCCTTAATCCCTAAATCCCTAAAT >Chr2 TATGACGTTTAGGGACGATCTTAATGACGTTTAGGGTTTTATCGATCAGCGACGTAGGGA >Chr3 GTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTT >Chr4 AACAAGGTACTCTCATCTCTTTACTGGGTAAATAACATATCAACTTGGACCTCATTCATA >Chr5 AACATGATTCACACCTTGATGATGTTTTTAGAGAGTTCTCGTGTGAGGCGATTCTTGAGG` [download] It'd be strange if it worked for you, because it is not working for me! Output file is always empty - whether it be with large file size, real-life input sequences, or small example ones like above, Or with or without N characters, in addition to the A, C, G, T For example above, I even changed window length to 10 I'm not sure what is happening! Perhaps you can share the code that works with this example (script version using in and out file handles) Sorry for the bother, but thanks	[reply] [d/l]
Re^9: Reduce RAM required by tybalt89 (Monsignor) on Jan 09, 2019 at 21:52 UTC
It's your fault :) In your original post, you showed the input as LOWER CASE. Your actual data file has input in UPPER CASE. Here's a version that handles upper case. #!/usr/bin/perl # https://perlmonks.org/?node_id=1228191 use strict; use warnings; my $window = 1e6; my $A = my $C = my $G = my $all = 0; my (@sizes, $tmp, $start); my $inputfile = shift // 'd.1228191'; my $outputfile = shift // 'd.out.1228191'; open my $in, '<', $inputfile or die "$! opening $inputfile"; open my $out, '>', $outputfile or die "$! opening $outputfile"; sub letter { my $n = int rand $all--; $n < $A ? ($A--, return 'A') : $n < $A + $C ? ($C--, return 'C') : $n < $A + $C + $G ? ($G--, return 'G') : return 'T'; } sub output { for my $count ( @sizes ) { print $out ">ID", $start++, "\n", map(letter(), 1 .. $count), "\n" +; } @sizes = (); } while( <$in> ) { if( /^>/ ) { $start //= s/\D+//gr; } elsif( /^[ACGT]/ ) { $A += tr/A//; $C += tr/C//; $G += tr/G//; $all += $tmp = tr/ACGT//; push @sizes, $tmp; $all >= $window and output(); } } $all and output(); close $in; close $out; [download]	[reply] [d/l]
Re^10: Reduce RAM required by onlyIDleft (Scribe) on Jan 09, 2019 at 23:02 UTC