in reply to Re^2: counting the number of 16384 pattern matches in a large DNA sequence
in thread counting the number of 16384 pattern matches in a large DNA sequence

AAAAAAA AAAAAAT AAAAAAG AAAAAAC

Are all your patterns the same length? Are they all upper case? Are you looking for exact (including case) matches only?

Is it possible to obtain a copy of the patterns file?

A small extract from a fasta file I have kicking around (HG:chr20):

... NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN GATCCAgaggtggaagaggaaggaagcttggaaccctatagagttgctga gtgccaggaccagatcctggccctaaacaggtggtaaggaaggagagagt gaaggaactgccaggtgacacactcccaccatggacctctgggatcctag ctttaagagatcccatcacccacatgaacgtttgaattgacagggggagc ...

index is usually faster for matching constant strings, but if you you want case independent matches, you would need to uc the sequences before searching (and studying).


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

Replies are listed 'Best First'.
Re^4: counting the number of 16384 pattern matches in a large DNA sequence
by anonym (Acolyte) on Jun 14, 2012 at 20:16 UTC
    Thanks. Yes, the patterns are of same length and the search is case sensitive.

      How long does your current method take?


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?