Hi,
Without reading through all the replies....To be honest you have had quite a few.
My advice would be that there is always the potential to compress. Even in random sequences you will get repeated patterns. The difficulty is finding those patterns. You want a pattern that is easy to find first.
My advice would be to sort the text of interest and then count for each character type. That might be best done in a database. You might want to split the text up as well and do that bit by bit. With this information you will be able to better find opportunities for compression.
Update__________________
With infinite computing power LanX would be right.
For very large file sizes it would be difficult if not impossible to find the best compression solution. In that situation I would be right.
I know that the question states 50%. But really if you think about it you could compress the stored algorithms that do the transformation. It just goes on and on. Do people understand what I am saying?
Sorting is a good place to start maybe because the algorithm's (code) can be modified from that point in order to preserve information that will allow the recreation of the original file. There are a range of different sorting algorithms that I have in a book here. If anyone want me to post any of those I will.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.