http://qs1969.pair.com?node_id=181094

I always wanted to write my own Huffman decoder. So, today I did. I don't claim to be fastest or smartest (quite the opposite I guess), but it's mine.
I didn't make any encoder, it's all done with pen and paper.

And now the obligatory: This is my first JAPH, comments are welcome.

#!/usr/bin/perl -w use integer; @t=("12","34","56","7 ","89",":h","er",";k","<t","=a","lc","ju","sn"," +op"); @m=split/ */,"\x00\x50\xA5\xd2\xc5\xbb\x96\xef\x06\xbc\x8e\xe0"; $i=0;$n=0;$p=0;$x=0; do { $n=(($i%8)==0)?ord(shift @m):$n; $x=ord(substr($t[$p],(($n&128)>>7),1)); $p=(($x&240)==48)?($x&15):0; if ($p==0){print chr($x)}; $i++;$n<<=1; } until ($i==91) ;

Replies are listed 'Best First'.
Re: Huffman once again
by fsn (Friar) on Jul 17, 2002 at 19:56 UTC
    Well, as you can see, I put more effort into implementing the algorithm than obscuring it. I mean, I say Huffman in the title already. Sheesh...

    Anyway, jynx was kind enogh to msg me with some much needed hints. Well, a conversation started on our scratchpads. With jynx's permission I'll now repost it, with each turn in a new reply. But we'll start with jynx's initial response.


    Not to harp, but here's a few comments1:
    • Overuse of parentheses breaks up the code into bite-size chunks for others, making it easier for them to "chew"
    • Explicit checks against numbers clue in the reader to an important value
    • Explicit minor changes to variables make sure that everyone knows where you're coming from (e.g. $i++)
    • Extraneous white space further breaks things so the reader can stop and think about a chunk
    • Explicit assignment to variables gives the reader too much notice as to where things come from
    • Blocks of code make obvious chunks that can be parsed seperately to slowly form a larger picture
    • Clearly one can't remove all instances of the above from an obfu (or at least, not always), but reducing instances should help obfuscate the code...
    Making some changes along those lines yields:
    #!/usr/bin/perl @t=("12","34","56","7 ","89",":h","er",";k","<t","=a","lc","ju","sn"," +op"); @m=split/ */,"\x00\x50\xA5\xd2\xc5\xbb\x96\xef\x06\xbc\x8e\xe0"; do{ $n=$i++%8?$n:ord shift@m; $x=ord substr$t[$p],($n&128)>>7,1; $p=($x&240)==48?$x&15:0; !$p&&print chr$x; $n<<=1; }until($i==91);
    Notice also i took out the 'use integer' statement and the initialization of variables since they're unnecessary. Since Perl will complain about uninitialized values i also took off the -w. For testing purposes it's useful to keep such things in, but removing unnecessary things also removes extra clues for the reader :-)

    Next we can take out variables that are only used once and remove instances where conditions overlap. In particular to this obfu, if $p is zero, we print. But on the previous line we just set it to zero. We can combine the two statements and use print's return value effectively here. The following is an iterative process of combining overlapping statements and reducing variables:

    # 1 # collapse printing into the $p assignment @t=("12","34","56","7 ","89",":h","er",";k","<t","=a","lc","ju","sn"," +op"); @m=split/ */,"\x00\x50\xA5\xd2\xc5\xbb\x96\xef\x06\xbc\x8e\xe0"; do{ $n=$i++%8?$n:ord shift@m; $x=ord substr$t[$p],($n&128)>>7,1; $p=($x&240)==48?$x&15:-1+print chr$x; $n<<=1; }until($i==91); # 2 # Collapse the $x assignment into the $p assignment @t=("12","34","56","7 ","89",":h","er",";k","<t","=a","lc","ju","sn"," +op"); @m=split/ */,"\x00\x50\xA5\xd2\xc5\xbb\x96\xef\x06\xbc\x8e\xe0"; do{ $n=$i++%8?$n:ord shift@m; $p=(($x=ord substr$t[$p],($n&128)>>7,1)&240)==48?$x&15:-1+print chr$ +x; $n<<=1; }until($i==91); #3 # Collapse the $n assignment into the $x assignment @t=("12","34","56","7 ","89",":h","er",";k","<t","=a","lc","ju","sn"," +op"); @m=split/ */,"\x00\x50\xA5\xd2\xc5\xbb\x96\xef\x06\xbc\x8e\xe0"; do{ $p=(($x=ord substr$t[$p],(($n=$i++%8?$n:ord shift@m) &128)>>7,1)&240)==48?$x&15:-1+print chr$x;$n<<=1; }until($i==91); #4 # Remove @t and replace it with a constant list @m=split/ */,"\x00\x50\xA5\xd2\xc5\xbb\x96\xef\x06\xbc\x8e\xe0"; do{ $p=(($x=ord substr("12","34","56","7 ","89",":h","er",";k","<t","=a" +,"lc","ju","sn","op") [$p],(($n=$i++%8?$n:ord shift@m) &128)>>7,1)&240)==48?$x&15:-1+print chr$x;$n<<=1; }until($i==91); #5 # condense the previous @t, removing quotes to save space @m=split/ */,"\x00\x50\xA5\xd2\xc5\xbb\x96\xef\x06\xbc\x8e\xe0"; do{ $p=(($x=ord substr+(split/,/,'12,34,56,7 ,89,:h,er,;k,<t,=a,lc,ju,sn +,op') [$p],(($n=$i++%8?$n:ord shift@m)&128)>>7,1)&240)==48? $x&15:-1+print chr$x;$n<<=1; }until($i==91); #6 # Remove @m, replacing it with a constant list # This requires a smaller variable for the 'shift' effect... do{ $p=(($x=ord substr+(split/,/,'12,34,56,7 ,89,:h,er,;k,<t,=a,lc,ju,sn +,op') [$p],(($n=$i++%8?$n:ord ((split/ */,"\x00\x50\xA5\xd2\xc5\xbb\x96\xef\x06\xbc\x8e\xe0")[$.++ +]) )&128)>>7,1)&240)==48?$x&15:-1+print chr$x;$n<<=1; }until($i==91); #7 # Change all the variables to 'strict compliant' ones... # Remove extra semi-colons and parentheses... do{$a=(($_=ord substr+(split/,/,'12,34,56,7 ,89,:h,er,;k,<t,=a,lc,ju,s +n,op')[ $a],(($b=$;++%8?$b:ord ((split/ */,"\x00\x50\xA5\xd2\xc5\xbb\x96\xef\x06\xbc\x8e\xe0")[$.++]) +) &128)>>7,1)&240)==48?$_&15:-1+print chr;$b<<=1}until$;==91 #8 (final version) # Split up the previous @m for allignment issues do{$a=(($_=ord substr+(split/,/,'12,34,56,7 ,89,:h,er,;k,<t,=a,lc,ju,s +n,op')[ $a],(($b=$;++%8?$b:ord((split/ */,"\x00\x50\xA5\xd2\xc5\xbb\x96\xef\x0 +6\xbc". "\x8e\xe0")[$.++]))&128)>>7,1)&240)==48?$_&15:-1+print chr;$b<<=1}unti +l$;==91
    It's now just a block of code with little whitespace that readers must attempt to swallow all at once. It still uses too many parentheses unfortunately, but Perl would have problems parsing correctly without them.

    Anyway, that's the process i went through after running your japh. While i'm not the best at obfu, hopefully this will help a little...

    nuf evah,
    jynx

    1NOTE: i usually format to 60 places for monastery reasons, but sometimes, particularly in cases of obfus, i format to 80 characters, which is standard terminal width. Because of this i must appologize for the random red +'s everywhere (i've set my page preferences to wider than 60 chars to accomodate this 80 char fetish :)

      My reply back:

      Ok, here we go...

      This is my second try at the huffman decompresser. I have changed the implementation a bit, replacing the arrays with strings. I've also applied some of the suggestions jynx made, inlineing the literals, using 'compliant' variable names ($; was a fine twist..).

      #!/usr/bin/perl do{$_=ord substr"1234567 89:her;k<t=alcjusnop",$:*2+ (((ord(substr"\x00\x50\xA5\xd2\xc5\xbb\x96\xef\x06". "\xbc\x8e\xe0",$;/8,1)&2**(7-$;++%8))==0)?0:1),1;$:= ($_&240)==48?$_&15:-1+print chr}until$;==91
        Here is jynx's response back, and I think this will be the final version. It sure looks compact and unreadable to me now. And, yup, it was an enjoyable excersize...

        update: after reading fsn's scratchpad (which betters my attempts a bit :), there are two last things i can think of that may help: (1) reducing one more variable (2) removing the do{} block...
        $_=ord substr"1234567 89:her;k<t=alcjusnop",(($_&240)==48?$_& 15:-1+print chr)*2+((ord(substr"\x00\x50\xA5\xd2\xc5\xbb\x96". "\xef\x06\xbc\x8e\xe0",$;/8,1)&2**(7-$;++%8))?1:0),1while$;<92
        Note how since the variable $: was removed, the $; limit had to be increased to 92 to accomodate and finish printing the string. Also the ==0 check was removed and the ==91 check changed to <92...

        Don't know if it's more or less obfuscated, and as i've said i'm not very good, but it seems to be more difficult to parse. i had fun with this exercise, and i hope fsn did as well :-)

        jynx