As one would expect from a C-coded solution, it is faster than the Pure Perl bitwise string differencing approach of GrandFather in Re: simple string comparison for efficiency: about 1000% for strings of 24,000 bases, about 350% for 240,000 bases (on my laptop: YMMV).
It uses an approach that, by happy accident, works for base characters A, C, T and G, and the don't-care character N. It can be extended a bit for other 'bases' (I know other characters are used to represent common base sequences), but sooner or later it will break down. There is a similar (and as yet untested) approach that I can think of that could reliably handle a much wider range of 'bases', but it would require a separate (but I think fairly fast), character-substitution encoding step to convert input files to a form amenable to differencing.
Please take a look at the material on my scratchpad and, if you think it is of interest, I can post it in this thread or in a new thread as you think best.
In reply to Re: simple string comparison for efficiency
by AnomalousMonk
in thread simple string comparison for efficiency
by CaptainF
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |