http://qs1969.pair.com?node_id=168537

in reply to Re: Re: Re: Self-extracting compressed code!

what problem or itch inspired you?

Good question. Basically I wanted to isolate and clean up the quoting code from Data::Dump in that there is code for converting binary blocks into Base64 streams. I figured that if there was already going to be code to do base64 then I might as well compress on the fly as well. This was all for Data::BFDump where I am usually more interested in structure than content, and having ways to represent and quote large chunks of text in an efficient way appealed to me. Although I suppose it could be argued that I got carried away.

since they require a priority queue module

Hmm. I dont know anything like as much about compression as you, but I spent a while playing around with the Huffman algorithm in perl. When exactly does the priority queue (heap i assume) become critical? For instance in the testing I was doing I was building huffman trees for thousands of elements, and even then initially sorting the elements and an algorithm I worked out meant really fast run times without using a priority queue. Why is the queue so critical? In a standard compression my understand was that usually there arent that many elements or symbols. With a huffman algorithm if the initial list of elements to be encoded is inorder the tree can be constructed with almost no "sort" passes at all. (in fact I wouldn't be suprised if the algorithm I came up with outperforms an algorithm using a perl based heap.) If you're interested Ill post the algorithm and my perl code.

Also I have a brain teaser for you: (originally posted as A Golf question maybe? Huffman with a twist and never answered, but that could have been due to a bad description). There exists an algorithm that can take a huffman tree and reorganize it such that the longest paths are all one side of the tree and the shortest are on the other. (in otherwards the longest path is for the least likely element and the shortest for the most and all of the paths are sorted from left to right by their frequency.) This algorithm runs in linear time (O(N)). Can you find it?

:-)

Yves / DeMerphq
---
Writing a good benchmark isnt as easy as it might look.

• Comment on Re: Re: Re: Re: Self-extracting compressed code!

Replies are listed 'Best First'.
Re: Re: Re: Re: Re: Self-extracting compressed code!
by educated_foo (Vicar) on May 22, 2002 at 21:25 UTC
You're probably right, the heap isn't necessary, but it's certainly the "natural" way of implementing the repeated "take the two smallest elements, combine them, and put them back in the pool" operations. I can change the module to not require Heap (which I'm not that fond of anyways). Especially for < 256 elements, it shouldn't make a difference. So, did you sort the elements initially, then splice the results in at the right places in the list or something?

As for the brain-teaser, I imagine it's the same as telling whether two binary trees are isomorphic. I remember doing that on a problem set by putting them both in a canonical form (e.g. longest bits on one side). IIRC you had to assign a "weight" to each node based on the number and depth of its children, then always put the heaviest child on the left. I don't remember the exact details, but is it something like this?

/s
Update: Heap dependency removed, and if anything it's faster.