in reply to Re^5: Dynamically Updating Frequency Analysis
in thread Dynamically Updating Frequency Analysis
Once the data structure has been built, I am going to use it a number of times to determine the best compression I can using some boundaries. The first time will set the high water mark. Pick the item that gives the maximum compression and remove it from the list - which in return removes some other items completely while reducing others. Pick the next highest on the list and repeat. Stop when 32 items have been chosen. Now, start over (tree is completely re-ininitialized to its original form). This time, use an alternate picking strategy - perhaps you alternate from the top of the list on odd number turns and the second highest on even numbered turns. If after selecting 32 items you have reached a higher compression than the previous highest, it becomes your new water mark.
As far as how long (assuming a time based approach rather than a fixed number of iterations) - I see that as being a command line option. Just as gzip has a -1 through -9 to trade time for compression, this may be a way to let the user choose.
I really like the adaptive version of LZW. If you fill up your dictionary and the portion of the file you are compressing currently isn't benefiting much from it - dump the dictionary and start over.
Cheers - L~R
|
|---|