Re: Just Another Unpatented Sequence

Part of the Perl Genome Project?
:) Could very well be! tadman is starting with a small set of data, then expanding it, and indexing into it using values computed from $_. De-obfuscators would do well to pay attention to how q{K...}=~/./g behaves and maybe toss in a few print statements throughout the obfu.

People who like to find out just how painstaking this obfu was for me to decode should just

...

0. The original version:

 
$_=do{q}HHJHPAPAJJAJPJJHAAJJAAJHJAHAJPHJHAHPPJ 
HAAHHJJJHJPPHPHJAJPHAAPHAJJHHJHAHA}};\s+\s++s+ 
y?JAPH?0-3?;s;(.)(.)(.);(map{ord>0130?uc:(),uc 
}q{KnKttiIMRsROhOppllrr y ssLfL cWEdEaauujj}=~ 
/./g)[$1*16+$2*4+$3];eg;s;\w+;\L\u$&;g;{print}
[download]

====

1. Reformat. Here I've tentatively identified the program as being 5 lines long.

 
$_=do{q}HHJHPAPAJJAJPJJHAAJJAAJHJAHAJPHJHAHPPJ 
HAAHHJJJHJPPHPHJAJPHAAPHAJJHHJHAHA}}; 
 
\s+\s++s+ 
y?JAPH?0-3?; 
 
s;(.)(.)(.);(map{ord>0130?uc:(),uc 
}q{KnKttiIMRsROhOppllrr y ssLfL cWEdEaauujj}=~ 
/./g)[$1*16+$2*4+$3];eg; 
 
s;\w+;\L\u$&;g; 
 
{print}
[download]

When reformatting obfus, I always run the reformatted code; sometimes I'll end up breaking lines in a crucial place, so the code no longer runs. In this case, however, the code still runs.

====

2. Re-reformat. Here I add line numbering and add some spaces to the lines. ymmv

 
   1  $_=do{q}HHJHPAPAJJAJPJJHAAJJAAJHJAHAJPHJHAHPPJHAAHHJJJHJPPHPHJAJ
+PHAAPHAJJH
HJHAHA}}; 
 
   2  \s+\s++s+y?JAPH?0-3?; 
 
   3  s;(.)(.)(.); 
        (map { ord > 0130 
               ? uc 
               : () , uc } 
         q{KnKttiIMRsROhOppllrr y ssLfL cWEdEaauujj}=~/./g)[$1*16+$2*4
++$3]; 
      eg; 
 
   4  s;\w+;\L\u$&;g; 
 
   5  {print}
[download]

Ack! I've got two terminal windows open; in one, I reformat the code in a text editor; in another, I run perl interactively. (Just type 'perl' at your command line; enter your code, and type __END__ to run the code.) Here's what my interactive session shows; the first line ('$ perl') is me typing 'perl' at my shell prompt. The rest is me pasting the first few lines into the interpreter...

 
 
$ perl 
$_=do{q}HHJHPAPAJJAJPJJHAAJJAAJHJAHAJPHJHAHPPJHAAHHJJJHJPPHPHJAJPHAAPH
+AJJH 
HJHAHA}}; 
\s+\s++s+y?JAPH?0-3?; 
s;(.)(.)(.); 
        (map { ord > 0130 
               ? uc 
               : () , uc } 
         q{KnKttiIMRsROhOppllrr y ssLfL cWEdEaauujj}=~/./g)[$1*16+$2*4
++$3]; 
      eg; 
Bareword found where operator expected at - line 9, near "eg" 
  (Might be a runaway multi-line ;; string starting on line 4) 
        (Missing semicolon on previous line?)
[download]

Bah - I broke line 3. I'll have to be a little less zealous in my reformatting. But, uh, this is actually good... yeah, because it identifies an area of the program that I'll have to look at more closely. Hey, if I'm lucky, I might have inadvertently identified the heart of this JAPH! (you can laugh now, tadman).

====

3. Re-reformat, take two... re-re-reformat?

 
   1  $_=do{q}HHJHPAPAJJAJPJJHAAJJAAJHJAHAJPHJHAHPPJHAAHHJJJHJPPHPHJAJ
+PHAAPHAJJH
 
HJHAHA}}; 
 
   2  \s+\s++s+y?JAPH?0-3?; 
 
   3  s;(.)(.)(.);(map{ord>0130?uc:(),uc}q{KnKttiIMRsROhOppllrr y ssLf
+L cWEdEaau
ujj}=~/./g)[$1*16+$2*4+$3];eg; 
 
   4  s;\w+;\L\u$&;g; 
 
   5  {print}
[download]

Good, this still runs. Looking at line 3, I can see that there's some type of substitution that's going on. The line is essentially this:

s/(.)(.)(.)/some_code/eg; So next I want to break out 'some_code' into a sub routine call.

====

4. Re-re-re-ahh, heck, you get the idea.

 
   1  $_=do{q}HHJHPAPAJJAJPJJHAAJJAAJHJAHAJPHJHAHPPJHAAHHJJJHJPPHPHJAJ
+PHAAPHAJJH
 
HJHAHA}}; 
 
   2  \s+\s++s+y?JAPH?0-3?; 
 
   3  s;(.)(.)(.);tadman($1,$2,$3);eg; 
 
   4  s;\w+;\L\u$&;g; 
 
   5  {print} 
 
   6. sub tadman 
      { 
        my ($f, $s, $t) = @_; 
        ( map { ord>0130?uc:(),uc } 
          q{KnKttiIMRsROhOppllrr y ssLfL cWEdEaauujj}=~/./g) 
        [$f*16+$s*4+$t] 
      }
[download]

Here I've replaced the ~~mess~~ code in line 3's substitution with a call to tadman(). Now I feel comfortable enough to go through line-by-line and dissect this beastling of an obfu.

==== 5. Line 1: What's this?

 
   $_=do{q}HHJHPAPAJJAJPJJHAAJJAAJHJAHAJPHJHAHPPJHAAHHJJJHJPPHPHJAJPHA
+APHAJJH 
HJHAHA}};
[download]

Here tadman starts out by setting the Perl's default variable, $_, to some value. Or is it code? Recently, japhy had an interesting post relating to Perl's quoting operators; s/he pointed out that )]}> can be used not only to close quotations, but to *open* them as well. Compare:

 
use strict; 
use warnings; 
my @friends = qw ( Aaron Donald Roger etc ); 
my @family = qw)Mom Dad Brother Sister etc); 
foreach ( @friends ) { print $_, " is a friend\n" } 
foreach ( @family ) { print $_, " is family\n" }
[download]

But that do{} wrapping the q() is a bit confusing to me. When in doubt, experiment!

 
$ perl -e '$_ = do{q(Hello World)}; print $_,"\n"' 
Hello World
[download]

Okay, I'm convinced that the do{} is just tossed in to confuse the issue a bit, so when I re-write line 1 from now on I'll drop the do.

So for now $_ is a long string of Js, As, Ps, and Hs.

====

6. Line 2: ++confusion++

 
\s+\s++s+y?JAPH?0-3?;
[download]

I'm seeing a lot of s's and ?'s here. Those +'s and \'s aren't helping me read this either. I wonder what Perl is doing with it? And should this really be two lines? It almost looks like a s/// and y/// operation... but there's only one semicolon. Ack!

The only way for me to figure this out is to play with the code a little bit. Let's see here...

 
$ perl 
$_='HHJHPAPAJJAJPJJHAAJJAAJHJAHAJPHJHAHPPJHAAHHJJJHJPPHPHJAJPHAAPHAJJH
+ 
HJHAHA'; \s+\s++s+y?JAPH?0-3?; 
print; 
__END__ 
3303212100102003110011030131023031322031133000302232301023112310033031
+31
[download]

Hmm, okay. that y?JAPH?0-3? is the same as tr/JAPH/0123/, so that's how I'll write it from now on.

But I still don't grok \s+\s++s+. What happens if I just ignore it?

 
$ perl 
$_='HHJHPAPAJJAJPJJHAAJJAAJHJAHAJPHJHAHPPJHAAHHJJJHJPPHPHJAJPHAAPHAJJH
+ 
HJHAHA'; 
tr/JAPH/0123/; 
print; 
__END__ 
330321210010200311001103013102303132203113300030223230102311231003 
303131
[download]

Aha! Somehow, \s+\s++s+ is stripping the newline out of $_.

Knowing what that portion of the code does, I can try writing it myself and see if I can figure out how tadman's version does what it does.

Hey, this is promising:

 
$ perl 
$_='HHJHPAPAJJAJPJJHAAJJAAJHJAHAJPHJHAHPPJHAAHHJJJHJPPHPHJAJPHAAPHAJJH
+ 
HJHAHA'; 
s/\n//g; 
print; 
__END__ 
HHJHPAPAJJAJPJJHAAJJAAJHJAHAJPHJHAHPPJHAAHHJJJHJPPHPHJAJPHAAPHAJJHHJHA
+HA
[download]

But tadman isn't using a \n, nor is he using a /g modifier. How can he avoid this?

Looking at the $_ again, it's easy to see how /g can go: there's only one newline character in $_. And the \n can be changed into a \s if we put a /s modifier on the end of the substitution. So now I've come up with:

 
$ perl 
$_='HHJHPAPAJJAJPJJHAAJJAAJHJAHAJPHJHAHPPJHAAHHJJJHJPPHPHJAJPHAAPHAJJH
+ 
HJHAHA'; 
s/\s+//s; 
print; 
__END__ 
HHJHPAPAJJAJPJJHAAJJAAJHJAHAJPHJHAHPPJHAAHHJJJHJPPHPHJAJPHAAPHAJJHHJHA
+HA
[download]

Now I can compare:

 
s/\s+//s; 
vs 
\s+\s++s+
[download]

Hrmm... I bet that leading \ is there as a red herring... Here's a quick test.

 
$ perl 
$_='hello 
world'; 
\s+\s++s+; 
print $_; 
__END__ 
syntax error at - line 3, near "+;" 
Execution of - aborted due to compilation errors.
[download]

Yikes. What I'd expected is that the newline in $_ would be removed; instead, the code doesn't compile. What a silly test; it's leading me off on a tangent.

The easiest way to deal with this is to put in the tr/// and see whether the test code runs.

 
$ perl 
$_='hello 
world'; 
\s+\s++s+tr/JAPH/0123/; 
print $_; 
__END__ 
helloworld
[download]

The tr/// doesn't do anything because 'hello world' doesn't contain anything in the JAPH character class. But I've figured out what the + does: it acts like a and or &&. And and and && are just different ways of writing a conditional statement.

Here's a rewrite of line 2:

 
s/\s//s and tr/JAPH/0123/;
[download]

Notice I've removed the superflous leading \, and have changed the first three +'s into /'s. The last + acts as an and, so I made that change as well.

Or, if we wanted to, we could be silly and write:

 
if ( s/\s//s ) {  # if we remove a space char, where "space" includes 
+\n... 
  tr/JAPH/0123/;  # ...then do this tr/ansl/ation/ 
}
[download]

But now we know what line 2 does: it strips the \n from $_, and does a tr/JAPH/0123/.

====

7. Lines 3 and 4

Now that I've made the tadman() sub, lines 3 and 4 look similar enough that we can look at them in tandem.<p.

 
   3  s;(.)(.)(.);tadman($1,$2,$3);eg; 
 
   4  s;\w+;\L\u$&;g;
[download]

<p. In both lines, tadman uses ';' as an alternate delimiter in his s/ubsti/tution/s. (You can read more about this at perlre; search for 'delimiter'.)

Line 3 is interesting, because the right-hand side of the s/// contains code. The /e modifier tells Perl to 'e'xecute the right-hand side of the s/// as though it were code; without this flag, Perl would simply replace every 3 characters in $_ with "tadman($1,$2,$3)". (small lie; $1, $2, and $3 would be interpolated to their respective matches.) Regardless, we don't want to insert a whole bunch of tadman() into $_; we want to actually run that sub on the matched letters. Of course, the /g flag here says "do this substitution against the whole string".

You can test out the /e flag with the following one-liners.

 
$ perl -e '$_ = "hello world"; s/(hello)/uc $1/g; print $_,"\n"' 
uc hello world 
 
$ perl -e '$_ = "hello world"; s/(hello)/uc $1/ge; print' 
HELLO world
[download]

Notice that in the first one, the captured 'hello' didn't get upper- cased via the uc function; instead, 'hello' was replaced with 'uc hello'. Perl made the uc active when I said to /'e'xecute the right-hand side of the substitution.

And actually, doing s/pattern/code/eg is really a simple way of writing a while block. Fans of "Total Recall" might re-write line 3 as:

 
while ( /(.)(.)(.)/g ) { 
  tadman ( $1, $2, $3 ); 
}
[download]

Notice that although you can get rid of the /e flag this way (or, in reverse, get rid of a while block!), you still need to keep the /g flag.

 
   4  s;\w+;\L\u$&;g;
[download]

Line 4 sent me back to the documentation to figure out what \L and \u do in a regex. I already recognize $& as being one of perl's built in variables. You can read a discussion of $& in perlvar, perlre, and if you grep for 'slow my program' in perlfaq6.

perlre has this to say about \L and \u :

 
           \u          uppercase next char (think vi) 
           \L          lowercase till \E (think vi)

So line 4 seems to be a way of uppercasing some letters while lowercasing the rest. I'm big on testing code, so let's see if I'm right:

 
$ perl -e '$_ = "hello world"; s;\w+;\L\u$&;gx; print $_,"\n"' 
Hello World 
$ perl -e '$_ = "HELLO WORLD"; s;\w+;\L\u$&;gx; print $_,"\n"' 
Hello World
[download]

Okay, I gotta admit, that's kind of neat. It looks suspiciously like Perl's built-in function ucfirst, but there's two gotchas:

 
$ perl -e '$_ = "hello world"; $_ = ucfirst($_); print $_,"\n"' 
Hello world  
$ perl -e '$_ = "HELLO WORLD"; s;\w+;\L\u$&;gx; print $_,"\n"' 
HELLO WORLD
[download]

That's right - ucfirst works on line boundaries; tadman wants his to work on word boundaries. Additionally, ucfirst doesn't lower-case anything that is upper-cased, so the phrase 'HELLO WORLD' doesn't become the desired 'Hello World'.

If a person really wanted to, they could re-write line 4 with a foreach loop that splits on spaces and pushes ucfirsted data back onto some array, but why? I'm happy knowing what's happening on line 4, and just adding a new trick to my meager arsenal.

====

8. {print}

 
{print}
[download]

This is the end of the program. Remember that print without any arguments will print $_ to STDOUT.

Unfortunately, we still haven't really figured out what's happening between where $_ is assigned, back in line 1, and where it gets printed, in line 5.

Maybe I was right in identifying line 3 as the heart of the program. Let's dig.

====

9. tadman()

 
    sub tadman 
    { 
 1    my ($first, $second, $third) = @_; 
 2    ( map { ord>0130?uc:(),uc } 
        q{KnKttiIMRsROhOppllrr y ssLfL cWEdEaauujj}=~/./g) 
      [$first*16+$second*4+$third] 
    }
[download]

This piece of code looks thorny. I was able to make it a subroutine earlier, but that was pretty easy - it was just a matter of finding a way of formatting the lines without breaking their functionality, then wrapping them up in a sub declaration.

The first line of the subroutine grabs the matched $1, $2, and $3 and assigns them to $first, $second, and $third.

The second line starts at '( map' and ends at '$third]'. At first I thought that the quadratic expression inside [] was a single-element anonymous array, but decided I was just confusing myself. Instead, the [] is an array index; the parens around the map statement force the output into list context, and the [] dig in and pull out a single element. There is an anonymous array here, but it's a bit hidden.

Here's where an example is nice. Let's say I have a pipe-delimited line, and want to split out the Nth entry from that line. When I first started writing Perl, I probably would have done something like this:

 
#!/usr/bin/perl 
use strict; 
use warnings; 
 
my $line = 'Bond, James Bond|007|Secret agent|Walther PPK|etc|etc|etc'
+; 
my @fields = split /\|/, $line; 
my $occupation = $fields[2]; 
print "$occupation man\n"; 
exit;
[download]

Notice that I had to go to the trouble of declaring an array, @fields, and then had to pull my $occupation from $fields[2]. This works, so there's nothing wrong with it - and it's also clear. But there's a slightly cleaner way to write the above split; using this new way, I can avoid creating @fields.

 
#!/usr/bin/perl 
use strict; 
use warnings; 
 
my $line = 'Bond, James|007|Secret agent|Walther PPK|etc|etc|etc'; 
my $occupation = ( split /\|/, $line )[2]; 
print "$occupation man\n"; 
exit;
[download]

Back to the code. We've figured out that the quadratic expression is simply an array index.

 
 2    ( map { ord>0130?uc:(),uc } 
        q{KnKttiIMRsROhOppllrr y ssLfL cWEdEaauujj}=~/./g) 
      [$first*16+$second*4+$third]
[download]

But what the heck is this map statement doing?

tadman's obfu here is a little easier to grok with some different spacing and comments:

 
    (        # open paren to force list context 
      map    # will be followed by BLOCK and LIST 
             # Here's the BLOCK: 
          { ord > 0130 ? uc : (), uc } 
 
             # so this must somehow be acting as a LIST 
          q{KnKttiIMRsROhOppllrr y ssLfL cWEdEaauujj} =~ /./g 
 
             # Finally, close paren and index into the list 
    )[$first*16+$second*4+$third]
[download]

Now that we've figured out what our LIST must be, we can re-write the sub without this obfuscation. I'm going to also to pre-figure the index upfront, so the map line will be a bit shorter to look at:

 sub tadman 
{ 
  my ($first, $second, $third) = @_; 
  my $index = $first * 16 + $second * 4 + $third; 
  my @key = split /|/, 'KnKttiIMRsROhOppllrr y ssLfL cWEdEaauujj'; 
  ( map { ord > 0130 ? uc : (), uc } @key )[ $index ]; 
}
[download]

Okay, so map is used to transform a list. In this case, tadman is expanding the original list. Notice that in the ternary expression, the two possible return values are either 'uc' or '(), uc'.

But why would tadman need to expand his list? Look at what the first index would be...

$_ starts out as 'HHJ'.
Then $_ gets hit with tr/JAPH/0123, which makes $_ = 330.
330 is fed into tadman() as 3, 3, 0.
In this case, the index is 60 ( 3*16 + 3*4 + 0 ).
But look at the total length of @key: it's only 40 characters long.
Obviously some magic needs to occur to allow tadman to do $key[60] without getting an undefined value - and that magic is map.

Remember back at line 4; we've already established that tadman has a case-shifting routine which upper-cases the first letter of a word, and lower-cases the rest. So returning 'uc' or '(), uc' in this map's ternary expression is a red herring; tadman didn't want to write $_ so just chose uc. Let's look at that ternary expression a bit more. I'll re-write it with some comments:

 
# 
# { ord > 0130 ? uc : (), uc } 
# 
if ( ord $_ > 0130 ) {  # if [ord] of $_ is > than 130... 
  $_ ;                  #   just return $_ 
} else {                # otherwise 
  (), $_ ;              #   return undef *and* $_ 
}
[download]

map takes a list, and map returns a list. Notice that if the decimal ASCII value of $_ is > than 130, then $_ is returned. But if it's *less* than 130, undef and $_ are returned. So, depending on what ord tells tadman about the current $_ map is working on, either $_ will be returned, or the list will be padded with an empty entry *and* $_ will be returned.

==== At this point we can re-write the whole durned obfuscation without the obfuscations.

 
#!/usr/bin/perl 
 
use strict; 
use warnings; 
 
# starting data 
$_ = 'HHJHPAPAJJAJPJJHAAJJAAJHJAHAJPHJHAHPPJ 
HAAHHJJJHJPPHPHJAJPHAAPHAJJHHJHAHA'; 
 
# strip newline within $_ 
s,\s,,s; 
 
# turn letters in $_ into numbers 
tr/JAPH/0123/; 
 
# turn every 3 numbers into a single character 
s;(.)(.)(.);tadman($1,$2,$3);eg; 
 
# do some case mapping 
s;\w+;\L\u$&;g; 
 
# show the deciphered message 
{print} 
 
 
sub tadman 
{ 
  # $1, $2, and $3 from the substitution 
  my ($A, $B, $C) = @_; 
 
  # be upfront about math 
  my $index = $A * 16 + $B * 4 + $C; 
 
  # here's the data from whence comes the non-trademarked message 
  my @key = split /|/, 'KnKttiIMRsROhOppllrr y ssLfL cWEdEaauujj'; 
 
  # here's the uncompressed data, that we can now index into 
  my @expanded_key = map { ord > 0130 ? $_ : (), $_ } @key; 
 
  # pull out the $index-th letter from @expanded_key 
  my $value = $expanded_key[$index]; 
  return $value; 
}
[download]

====

Thanks for this obfu, tadman - it was really a fun romp through perldoc, perlfaq, and some test code to figure out what you're doing here! At the risk of showing your tricks: how did you compute the indices and the @expanded_key/@key ? If by code, will you outline how it works? I drove myself silly yesterday trying to reverse-engineer such a program...

blyman
setenv EXINIT 'set noai ts=2'

Comment on Re: Just Another Unpatented Sequence Select or Download Code