Re^2: Efficient 7bit compression

Some compression algorithms build a dictionary of common substrings and replace the substring with the dictionary index. For example, if you have "My life if I buy a dog". The substrings "y " and "if" are repeated. So, if you replace them with a 1byte number and hoist the substrings into a dictionary somewhere else, you can shorten the string.

Problem is two-fold. First, you have to either mark which substrings are in the dictionary or have all strings in the dictionary. Second, your dictionary costs a certain amount of space. So, it's only likely that large strings will sufficiently amortize the cost of a dictionary to realize savings. Smaller strings will not.

Being right, does not endow the right to be rude; politeness costs nothing.
Being unknowing, is not the same as being stupid.
Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Comment on Re^2: Efficient 7bit compression

Replies are listed 'Best First'.
Re^3: Efficient 7bit compression by perlfan (Parson) on Mar 14, 2005 at 15:59 UTC
Ok I hear ya, I was not taking into account the string specific character frequency table that is used to decode a Huffman encoded string - I guess once could get around that by using a general one based on some character usage criteria, but that would probably not meet their needs.	[reply]