I have a short question, well basically looking for some inspiration. What I have is a set of strings that i intend to compress using lzt. But before I do that, I was wondering if I could improve the compression by pre-encoding them. In this case what comes into my mind is Hoffman and arithmetic coding but for that i need to preprocess the entire set (or at leas a section) which for my application is not the most optimal way to go.. as an alternative I could go with run-length (and smart delta coding, maybe ?) and this is where my imagination stops.
About the problem:
String set is usually comes in batches of size 10000 (|S|=10000) Alphabet is 3 (|A|=3, A={A,B,C}) Size if each string is exactly 63 characters (|s1| = |s2| = .. = |s_| +S|| = 63)
Example:
One can assume a random distribution of characters from Alphabet A = {A, B, C}ABBCBCAAAAABBCBCACCCAAAAACAAAAABBBBBAAAAABBAAAAAAAABBCCCACCAABC BCCCBCAACAABBBCAAACCAAAAACAAAAABBBBBAAAAABBAAAAAAAABBCCCACCABBC ABCCBBBAAAABBABCACABCCCCCCAAAAABBCBBCCCCAAAAAAAAAAAAACCCACCACCC ...
Possible solution:
First two bits can be used to store the alphabet information :position in a byte: 1 2 | 3 4 5 6 7 8 A A | Delta between mismatching consecutive characters
Other 6 can be used for RL coding. Which will ultimately consume 1 byte for each mismatching character. Does anyone have a suggestion for a more optimal solution than this one?00 A 01 B 11 C
Thank you :)
baxy
In reply to How to efficently pack a string of 63 characters by baxy77bax
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |