in reply to string to more compact format

How can i convert, for example, 'AGTCACA' to a more compact string with less bits?

Build a hash of all 256 4-tuples of (A, G, T, C), mapping each tuple to one byte. And then iterate over the string, four characters at a time, and use the hash for lookup.

Basically, I want to store these string to a hash and be able to compare them, and see if substrings are availabe.

The compression that you described makes that much harder, unless all substrings are then aligned to byte boundaries (ie blocks of four in the original string).

Perl 6 - links to (nearly) everything that is Perl 6.

Replies are listed 'Best First'.
Re^2: string to more compact format
by timray (Initiate) on May 17, 2010 at 17:30 UTC
    Could you explain what you are trying to achieve ? For example do you need to be able to match parts of the DNA sequence ? can these partial sequences be offset (eg a sequence 53 bytes into 1 string matching another 27 bytes into another) ?