How can i convert, for example, 'AGTCACA' to a more compact string with less bits?
Build a hash of all 256 4-tuples of (A, G, T, C), mapping each tuple to one byte. And then iterate over the string, four characters at a time, and use the hash for lookup.
Basically, I want to store these string to a hash and be able to compare them, and see if substrings are availabe.
The compression that you described makes that much harder, unless all substrings are then aligned to byte boundaries (ie blocks of four in the original string).
In reply to Re: string to more compact format
by moritz
in thread string to more compact format
by Boetsie
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |