in reply to Question about speeding a regexp count
If memory becomes an issue, you could walk the length of the string using substr or unpack. It will require keeping track of your position in the string but both functions provide a means for starting "inside" the string.my $dna = join '', map { chomp; $_ } <DATA>; my $template = ('AXA2X2A3X2' x (length($dna) - 2)) . 'AXA2XA'; my %count; $count{$_}++ for unpack $template, $dna; print "$_\t$count{$_}\n" for keys %count; __DATA__ AAAAAAAACAAGAATACACAACCACGACTAGAGAAGCAGGAGTATATAATCATGATTCCACAACACCAGC +ATCCCCACCCCCGCCTCGCGACGCCGGCGT CTCTACTCCTGCTTGGAGAAGACGAGGATGCGCAGCCGCGGCTGGGGAGGCGGGGGTGTGTAGTCGTGGT +TTTATAATACTAGTATTCTCATCCTCGTCT TGTGATGCTGGTGTTTTTATTCTTGTTTAACACAACCACTAGAGCAGTATATAATCCCACACCAGCCCCC +CCTCGCGACGGCGTCTCTACTCCTGGGAGA CGAGGATGCGCAGCGGCTGGGGAGGGGTGTAGTCTTATACTAGTATTCTCCTCGTCTTGTGATGCTGGAC +TGGGGTCGATCGTCGAAATCGGCTAGCTAA AAAAAAACAAGAATACACAACCACGACTAGAGAAGCAGGAGTATATAATCATGATTCCACAACACCAGCA +TCCCCACCCCCGCCTCGCGACGCCGGCGTC TCTACTCCTGCTTGGAGAAGACGAGGATGCGCAGCCGCGGCTGGGGAGGCGGGGGTGTGTAGTCGTGGTT +TTATAATACTAGTATTCTCATCCTCGTCTT GTGATGCTGGTGTTTTTATTCTTGTTTAACACAACCACTAGAGCAGTATATAATCCCACACCAGCCCCCC +CTCGCGACGGCGTCTCTACTCCTGGGAGAC
Cheers - L~R
Update: After testing the memory consumption, I updated the post to reflect that it may be realistic to use this approach
|
|---|