in reply to Re^2: counting the number of 16384 pattern matches in a large DNA sequence
in thread counting the number of 16384 pattern matches in a large DNA sequence
AAAAAAA AAAAAAT AAAAAAG AAAAAAC
Are all your patterns the same length? Are they all upper case? Are you looking for exact (including case) matches only?
Is it possible to obtain a copy of the patterns file?
A small extract from a fasta file I have kicking around (HG:chr20):
... NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN GATCCAgaggtggaagaggaaggaagcttggaaccctatagagttgctga gtgccaggaccagatcctggccctaaacaggtggtaaggaaggagagagt gaaggaactgccaggtgacacactcccaccatggacctctgggatcctag ctttaagagatcccatcacccacatgaacgtttgaattgacagggggagc ...
index is usually faster for matching constant strings, but if you you want case independent matches, you would need to uc the sequences before searching (and studying).
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: counting the number of 16384 pattern matches in a large DNA sequence
by anonym (Acolyte) on Jun 14, 2012 at 20:16 UTC | |
by BrowserUk (Patriarch) on Jun 14, 2012 at 20:31 UTC |