http://qs1969.pair.com?node_id=168537

in reply to Re: Re: Re: Self-extracting compressed code!

what problem or itch inspired you?

Good question. Basically I wanted to isolate and clean up the quoting code from Data::Dump in that there is code for converting binary blocks into Base64 streams. I figured that if there was already going to be code to do base64 then I might as well compress on the fly as well. This was all for Data::BFDump where I am usually more interested in structure than content, and having ways to represent and quote large chunks of text in an efficient way appealed to me. Although I suppose it could be argued that I got carried away.

since they require a priority queue module

Hmm. I dont know anything like as much about compression as you, but I spent a while playing around with the Huffman algorithm in perl. When exactly does the priority queue (heap i assume) become critical? For instance in the testing I was doing I was building huffman trees for thousands of elements, and even then initially sorting the elements and an algorithm I worked out meant really fast run times without using a priority queue. Why is the queue so critical? In a standard compression my understand was that usually there arent that many elements or symbols. With a huffman algorithm if the initial list of elements to be encoded is inorder the tree can be constructed with almost no "sort" passes at all. (in fact I wouldn't be suprised if the algorithm I came up with outperforms an algorithm using a perl based heap.) If you're interested Ill post the algorithm and my perl code.

Also I have a brain teaser for you: (originally posted as A Golf question maybe? Huffman with a twist and never answered, but that could have been due to a bad description). There exists an algorithm that can take a huffman tree and reorganize it such that the longest paths are all one side of the tree and the shortest are on the other. (in otherwards the longest path is for the least likely element and the shortest for the most and all of the paths are sorted from left to right by their frequency.) This algorithm runs in linear time (O(N)). Can you find it?

:-)

Yves / DeMerphq
---
Writing a good benchmark isnt as easy as it might look.