in reply to How to efficently pack a string of 63 characters
The amount of compression you can obtain is directly related to the amount of information in the data. The more constraints the data has, the less information it contains, and so the more you can compress it.
You've mentioned that you have an alphabet of three characters (A, B, C) and a random distribution. If you mean a uniform random distribution (i.e., all letters having a probability of 1/3), then each character in your string takes 1.585 bits to encode[1]. But your sample data shows that of 189 characters, A occurs 93 times, B occurs 44 times and C occurs 52 times. If that represents the actual distribution of characters, then you need only 1.503 bits to encode a character, allowing you to squeeze a bit more out of your 10K strings.
If you're able learn more constraints on the data (i.e., rules that valid data must follow), then you can squeeze the data even more. So if you want people to help you compress the data even further, we need to know more about your data if you want to get better compression. Note that *any* rules about the data can be relevant: so don't hold back! As an example, if each line was "similar" to the previous line by some simple rule, it may give us the information needed to
As an example, a couple years ago, you asked a similar question, and by looking over the code used to generate the data, we were able to determine that we could encode your 90 character strings in 51 bits, giving about 0.567 bits per character.
NOTES:
[1]: I used Shannon's Entropy formula to compute the number of bits per symbol.
EDIT: Fixed the wikipedia link as noted by fellow monks AnomalousMonk and LanX.
...roboticus
When your only tool is a hammer, all problems look like your thumb.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: How to efficently pack a string of 63 characters
by AnomalousMonk (Archbishop) on Sep 09, 2021 at 19:36 UTC | |
by LanX (Saint) on Sep 09, 2021 at 20:41 UTC | |
|
Re^2: How to efficently pack a string of 63 characters
by baxy77bax (Deacon) on Sep 09, 2021 at 21:05 UTC | |
by LanX (Saint) on Sep 10, 2021 at 13:17 UTC |