I don't need or want anything proprietary! (But accuracy would help!)
If you have recently run a fuzzy search for short sequences (primers?) (<32 bases) against a (publicly available) long sequence (~1GB or bigger), and have the knowledge/information available to answer the following questions, it would be greatly appreciated.
(And preferably -- though not absolutely necessary -- where can I download a copy.)
Figures like approx. 200 around 25-bases is better than nothing.
205 x average length 19 ranging from 14 to 25 is better.
A list of exact lengths better yet.
(Best of all would be a file of the actual sequences used; but I realise that might be verboten.)
Ie. What Hamming distance was acceptable for a match?
If your run used more complex rules (eg. < 3 insert or deletes and upto 5 transpositions), those details would help.
Also, if you used one of the BLASTx programs with a minimum "word length"; details of that setting would be important.
Here I really need more than just elapsed (wall clock) time.
Perfection would be the number of clock cycles or cpu seconds; which would be further enhanced if details of the CPU(s) used was available.
Just the overall number of match sites would suffice.
Match sites per short sequence would be ideal, assuming that I can have the input sequences as well.
In some ways this is the most important criteria. CPU type(s); no. of cores/type & clock speeds would be best.
I think I've found a better (more accurate and much faster) way to do such fuzzy searches; but before expending lots of effort on putting together a proper package for CPAN -- this is a pure, for fun, home project; not work -- then I'd really like to make some detail comparisons with the current state-of-the-art to convince myself that it a) works; b) is sufficiently faster to warrant the effort.
Basically, I want to run my crude prototype code against a few real (or at least realistic) testcases with known results and timings to see how it stands up before taking it any further.
Thanks for any help you can provide.
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |