in reply to Just Another Unpatented Sequence
People who like to find out just how painstaking this obfu was for me to decode should just
0. The original version:
$_=do{q}HHJHPAPAJJAJPJJHAAJJAAJHJAHAJPHJHAHPPJ HAAHHJJJHJPPHPHJAJPHAAPHAJJHHJHAHA}};\s+\s++s+ y?JAPH?0-3?;s;(.)(.)(.);(map{ord>0130?uc:(),uc }q{KnKttiIMRsROhOppllrr y ssLfL cWEdEaauujj}=~ /./g)[$1*16+$2*4+$3];eg;s;\w+;\L\u$&;g;{print}
====
1. Reformat. Here I've tentatively identified the program as being 5 lines long.
$_=do{q}HHJHPAPAJJAJPJJHAAJJAAJHJAHAJPHJHAHPPJ HAAHHJJJHJPPHPHJAJPHAAPHAJJHHJHAHA}}; \s+\s++s+ y?JAPH?0-3?; s;(.)(.)(.);(map{ord>0130?uc:(),uc }q{KnKttiIMRsROhOppllrr y ssLfL cWEdEaauujj}=~ /./g)[$1*16+$2*4+$3];eg; s;\w+;\L\u$&;g; {print}
When reformatting obfus, I always run the reformatted code; sometimes I'll end up breaking lines in a crucial place, so the code no longer runs. In this case, however, the code still runs.
====
2. Re-reformat. Here I add line numbering and add some spaces to the lines. ymmv
1 $_=do{q}HHJHPAPAJJAJPJJHAAJJAAJHJAHAJPHJHAHPPJHAAHHJJJHJPPHPHJAJ +PHAAPHAJJH HJHAHA}}; 2 \s+\s++s+y?JAPH?0-3?; 3 s;(.)(.)(.); (map { ord > 0130 ? uc : () , uc } q{KnKttiIMRsROhOppllrr y ssLfL cWEdEaauujj}=~/./g)[$1*16+$2*4 ++$3]; eg; 4 s;\w+;\L\u$&;g; 5 {print}
Ack! I've got two terminal windows open; in one, I reformat the code in a text editor; in another, I run perl interactively. (Just type 'perl' at your command line; enter your code, and type __END__ to run the code.) Here's what my interactive session shows; the first line ('$ perl') is me typing 'perl' at my shell prompt. The rest is me pasting the first few lines into the interpreter...
Bah - I broke line 3. I'll have to be a little less zealous in my reformatting. But, uh, this is actually good... yeah, because it identifies an area of the program that I'll have to look at more closely. Hey, if I'm lucky, I might have inadvertently identified the heart of this JAPH! (you can laugh now, tadman).$ perl $_=do{q}HHJHPAPAJJAJPJJHAAJJAAJHJAHAJPHJHAHPPJHAAHHJJJHJPPHPHJAJPHAAPH +AJJH HJHAHA}}; \s+\s++s+y?JAPH?0-3?; s;(.)(.)(.); (map { ord > 0130 ? uc : () , uc } q{KnKttiIMRsROhOppllrr y ssLfL cWEdEaauujj}=~/./g)[$1*16+$2*4 ++$3]; eg; Bareword found where operator expected at - line 9, near "eg" (Might be a runaway multi-line ;; string starting on line 4) (Missing semicolon on previous line?)
====
3. Re-reformat, take two... re-re-reformat?
Good, this still runs. Looking at line 3, I can see that there's some type of substitution that's going on. The line is essentially this:1 $_=do{q}HHJHPAPAJJAJPJJHAAJJAAJHJAHAJPHJHAHPPJHAAHHJJJHJPPHPHJAJ +PHAAPHAJJH HJHAHA}}; 2 \s+\s++s+y?JAPH?0-3?; 3 s;(.)(.)(.);(map{ord>0130?uc:(),uc}q{KnKttiIMRsROhOppllrr y ssLf +L cWEdEaau ujj}=~/./g)[$1*16+$2*4+$3];eg; 4 s;\w+;\L\u$&;g; 5 {print}
s/(.)(.)(.)/some_code/eg; So next I want to break out 'some_code' into a sub routine call.
====
4. Re-re-re-ahh, heck, you get the idea.
1 $_=do{q}HHJHPAPAJJAJPJJHAAJJAAJHJAHAJPHJHAHPPJHAAHHJJJHJPPHPHJAJ +PHAAPHAJJH HJHAHA}}; 2 \s+\s++s+y?JAPH?0-3?; 3 s;(.)(.)(.);tadman($1,$2,$3);eg; 4 s;\w+;\L\u$&;g; 5 {print} 6. sub tadman { my ($f, $s, $t) = @_; ( map { ord>0130?uc:(),uc } q{KnKttiIMRsROhOppllrr y ssLfL cWEdEaauujj}=~/./g) [$f*16+$s*4+$t] }
Here I've replaced the mess code in line 3's
substitution with a call to tadman(). Now I feel comfortable enough to
go through line-by-line and dissect this beastling of an obfu.
==== 5. Line 1: What's this?
Here tadman starts out by setting the Perl's default variable, $_, to some value. Or is it code? Recently, japhy had an interesting post relating to Perl's quoting operators; s/he pointed out that )]}> can be used not only to close quotations, but to *open* them as well. Compare:$_=do{q}HHJHPAPAJJAJPJJHAAJJAAJHJAHAJPHJHAHPPJHAAHHJJJHJPPHPHJAJPHA +APHAJJH HJHAHA}};
use strict; use warnings; my @friends = qw ( Aaron Donald Roger etc ); my @family = qw)Mom Dad Brother Sister etc); foreach ( @friends ) { print $_, " is a friend\n" } foreach ( @family ) { print $_, " is family\n" }
But that do{} wrapping the q() is a bit confusing to me. When in doubt, experiment!
$ perl -e '$_ = do{q(Hello World)}; print $_,"\n"' Hello World
Okay, I'm convinced that the do{} is just tossed in to confuse the issue a bit, so when I re-write line 1 from now on I'll drop the do.
So for now $_ is a long string of Js, As, Ps, and Hs.
====
6. Line 2: ++confusion++
\s+\s++s+y?JAPH?0-3?;
I'm seeing a lot of s's and ?'s here. Those +'s and \'s aren't helping me read this either. I wonder what Perl is doing with it? And should this really be two lines? It almost looks like a s/// and y/// operation... but there's only one semicolon. Ack!
The only way for me to figure this out is to play with the code a little bit. Let's see here...
Hmm, okay. that y?JAPH?0-3? is the same as tr/JAPH/0123/, so that's how I'll write it from now on.$ perl $_='HHJHPAPAJJAJPJJHAAJJAAJHJAHAJPHJHAHPPJHAAHHJJJHJPPHPHJAJPHAAPHAJJH + HJHAHA'; \s+\s++s+y?JAPH?0-3?; print; __END__ 3303212100102003110011030131023031322031133000302232301023112310033031 +31
But I still don't grok \s+\s++s+. What happens if I just ignore it?
$ perl $_='HHJHPAPAJJAJPJJHAAJJAAJHJAHAJPHJHAHPPJHAAHHJJJHJPPHPHJAJPHAAPHAJJH + HJHAHA'; tr/JAPH/0123/; print; __END__ 330321210010200311001103013102303132203113300030223230102311231003 303131
Aha! Somehow, \s+\s++s+ is stripping the newline out of $_.
Knowing what that portion of the code does, I can try writing it myself and see if I can figure out how tadman's version does what it does.
Hey, this is promising:
$ perl $_='HHJHPAPAJJAJPJJHAAJJAAJHJAHAJPHJHAHPPJHAAHHJJJHJPPHPHJAJPHAAPHAJJH + HJHAHA'; s/\n//g; print; __END__ HHJHPAPAJJAJPJJHAAJJAAJHJAHAJPHJHAHPPJHAAHHJJJHJPPHPHJAJPHAAPHAJJHHJHA +HA
But tadman isn't using a \n, nor is he using a /g modifier. How can he avoid this?
Looking at the $_ again, it's easy to see how /g can go: there's only one newline character in $_. And the \n can be changed into a \s if we put a /s modifier on the end of the substitution. So now I've come up with:
$ perl $_='HHJHPAPAJJAJPJJHAAJJAAJHJAHAJPHJHAHPPJHAAHHJJJHJPPHPHJAJPHAAPHAJJH + HJHAHA'; s/\s+//s; print; __END__ HHJHPAPAJJAJPJJHAAJJAAJHJAHAJPHJHAHPPJHAAHHJJJHJPPHPHJAJPHAAPHAJJHHJHA +HA
Now I can compare:
s/\s+//s; vs \s+\s++s+
Hrmm... I bet that leading \ is there as a red herring... Here's a quick test.
$ perl $_='hello world'; \s+\s++s+; print $_; __END__ syntax error at - line 3, near "+;" Execution of - aborted due to compilation errors.
Yikes. What I'd expected is that the newline in $_ would be removed; instead, the code doesn't compile. What a silly test; it's leading me off on a tangent.
The easiest way to deal with this is to put in the tr/// and see whether the test code runs.
$ perl $_='hello world'; \s+\s++s+tr/JAPH/0123/; print $_; __END__ helloworld
The tr/// doesn't do anything because 'hello world' doesn't contain anything in the JAPH character class. But I've figured out what the + does: it acts like a and or &&. And and and && are just different ways of writing a conditional statement.
Here's a rewrite of line 2:
s/\s//s and tr/JAPH/0123/;
Notice I've removed the superflous leading \, and have changed the first three +'s into /'s. The last + acts as an and, so I made that change as well.
Or, if we wanted to, we could be silly and write:
if ( s/\s//s ) { # if we remove a space char, where "space" includes +\n... tr/JAPH/0123/; # ...then do this tr/ansl/ation/ }
But now we know what line 2 does: it strips the \n from $_, and does a tr/JAPH/0123/.
====
7. Lines 3 and 4
Now that I've made the tadman() sub, lines 3 and 4 look similar enough that we can look at them in tandem.<p.
<p. In both lines, tadman uses ';' as an alternate delimiter in his s/ubsti/tution/s. (You can read more about this at perlre; search for 'delimiter'.)3 s;(.)(.)(.);tadman($1,$2,$3);eg; 4 s;\w+;\L\u$&;g;
Line 3 is interesting, because the right-hand side of the s/// contains code. The /e modifier tells Perl to 'e'xecute the right-hand side of the s/// as though it were code; without this flag, Perl would simply replace every 3 characters in $_ with "tadman($1,$2,$3)". (small lie; $1, $2, and $3 would be interpolated to their respective matches.) Regardless, we don't want to insert a whole bunch of tadman() into $_; we want to actually run that sub on the matched letters. Of course, the /g flag here says "do this substitution against the whole string".
You can test out the /e flag with the following one-liners.
$ perl -e '$_ = "hello world"; s/(hello)/uc $1/g; print $_,"\n"' uc hello world $ perl -e '$_ = "hello world"; s/(hello)/uc $1/ge; print' HELLO world
Notice that in the first one, the captured 'hello' didn't get upper- cased via the uc function; instead, 'hello' was replaced with 'uc hello'. Perl made the uc active when I said to /'e'xecute the right-hand side of the substitution.
And actually, doing s/pattern/code/eg is really a simple way of writing a while block. Fans of "Total Recall" might re-write line 3 as:
while ( /(.)(.)(.)/g ) { tadman ( $1, $2, $3 ); }
Notice that although you can get rid of the /e flag this way (or, in reverse, get rid of a while block!), you still need to keep the /g flag.
4 s;\w+;\L\u$&;g;
Line 4 sent me back to the documentation to figure out what \L and \u do in a regex. I already recognize $& as being one of perl's built in variables. You can read a discussion of $& in perlvar, perlre, and if you grep for 'slow my program' in perlfaq6.
perlre has this to say about \L and \u :
\u uppercase next char (think vi) \L lowercase till \E (think vi)
So line 4 seems to be a way of uppercasing some letters while lowercasing the rest. I'm big on testing code, so let's see if I'm right:
$ perl -e '$_ = "hello world"; s;\w+;\L\u$&;gx; print $_,"\n"' Hello World $ perl -e '$_ = "HELLO WORLD"; s;\w+;\L\u$&;gx; print $_,"\n"' Hello World
Okay, I gotta admit, that's kind of neat. It looks suspiciously like Perl's built-in function ucfirst, but there's two gotchas:
$ perl -e '$_ = "hello world"; $_ = ucfirst($_); print $_,"\n"' Hello world $ perl -e '$_ = "HELLO WORLD"; s;\w+;\L\u$&;gx; print $_,"\n"' HELLO WORLD
That's right - ucfirst works on line boundaries; tadman wants his to work on word boundaries. Additionally, ucfirst doesn't lower-case anything that is upper-cased, so the phrase 'HELLO WORLD' doesn't become the desired 'Hello World'.
If a person really wanted to, they could re-write line 4 with a foreach loop that splits on spaces and pushes ucfirsted data back onto some array, but why? I'm happy knowing what's happening on line 4, and just adding a new trick to my meager arsenal.
====
8. {print}
{print}
This is the end of the program. Remember that print without any arguments will print $_ to STDOUT.
Unfortunately, we still haven't really figured out what's happening between where $_ is assigned, back in line 1, and where it gets printed, in line 5.
Maybe I was right in identifying line 3 as the heart of the program. Let's dig.
====
9. tadman()
This piece of code looks thorny. I was able to make it a subroutine earlier, but that was pretty easy - it was just a matter of finding a way of formatting the lines without breaking their functionality, then wrapping them up in a sub declaration.sub tadman { 1 my ($first, $second, $third) = @_; 2 ( map { ord>0130?uc:(),uc } q{KnKttiIMRsROhOppllrr y ssLfL cWEdEaauujj}=~/./g) [$first*16+$second*4+$third] }
The first line of the subroutine grabs the matched $1, $2, and $3 and assigns them to $first, $second, and $third.
The second line starts at '( map' and ends at '$third]'. At first I thought that the quadratic expression inside [] was a single-element anonymous array, but decided I was just confusing myself. Instead, the [] is an array index; the parens around the map statement force the output into list context, and the [] dig in and pull out a single element. There is an anonymous array here, but it's a bit hidden.
Here's where an example is nice. Let's say I have a pipe-delimited line, and want to split out the Nth entry from that line. When I first started writing Perl, I probably would have done something like this:
#!/usr/bin/perl use strict; use warnings; my $line = 'Bond, James Bond|007|Secret agent|Walther PPK|etc|etc|etc' +; my @fields = split /\|/, $line; my $occupation = $fields[2]; print "$occupation man\n"; exit;
Notice that I had to go to the trouble of declaring an array, @fields, and then had to pull my $occupation from $fields[2]. This works, so there's nothing wrong with it - and it's also clear. But there's a slightly cleaner way to write the above split; using this new way, I can avoid creating @fields.
#!/usr/bin/perl use strict; use warnings; my $line = 'Bond, James|007|Secret agent|Walther PPK|etc|etc|etc'; my $occupation = ( split /\|/, $line )[2]; print "$occupation man\n"; exit;
Back to the code. We've figured out that the quadratic expression is simply an array index.
But what the heck is this map statement doing?2 ( map { ord>0130?uc:(),uc } q{KnKttiIMRsROhOppllrr y ssLfL cWEdEaauujj}=~/./g) [$first*16+$second*4+$third]
tadman's obfu here is a little easier to grok with some different spacing and comments:
( # open paren to force list context map # will be followed by BLOCK and LIST # Here's the BLOCK: { ord > 0130 ? uc : (), uc } # so this must somehow be acting as a LIST q{KnKttiIMRsROhOppllrr y ssLfL cWEdEaauujj} =~ /./g # Finally, close paren and index into the list )[$first*16+$second*4+$third]
Now that we've figured out what our LIST must be, we can re-write the sub without this obfuscation. I'm going to also to pre-figure the index upfront, so the map line will be a bit shorter to look at:
sub tadman { my ($first, $second, $third) = @_; my $index = $first * 16 + $second * 4 + $third; my @key = split /|/, 'KnKttiIMRsROhOppllrr y ssLfL cWEdEaauujj'; ( map { ord > 0130 ? uc : (), uc } @key )[ $index ]; }
Okay, so map is used to transform a list. In this case, tadman is expanding the original list. Notice that in the ternary expression, the two possible return values are either 'uc' or '(), uc'.
But why would tadman need to expand his list? Look at what the first index would be...
Remember back at line 4; we've already established that tadman has a case-shifting routine which upper-cases the first letter of a word, and lower-cases the rest. So returning 'uc' or '(), uc' in this map's ternary expression is a red herring; tadman didn't want to write $_ so just chose uc. Let's look at that ternary expression a bit more. I'll re-write it with some comments:
map takes a list, and map returns a list. Notice that if the decimal ASCII value of $_ is > than 130, then $_ is returned. But if it's *less* than 130, undef and $_ are returned. So, depending on what ord tells tadman about the current $_ map is working on, either $_ will be returned, or the list will be padded with an empty entry *and* $_ will be returned.# # { ord > 0130 ? uc : (), uc } # if ( ord $_ > 0130 ) { # if [ord] of $_ is > than 130... $_ ; # just return $_ } else { # otherwise (), $_ ; # return undef *and* $_ }
==== At this point we can re-write the whole durned obfuscation without the obfuscations.
====#!/usr/bin/perl use strict; use warnings; # starting data $_ = 'HHJHPAPAJJAJPJJHAAJJAAJHJAHAJPHJHAHPPJ HAAHHJJJHJPPHPHJAJPHAAPHAJJHHJHAHA'; # strip newline within $_ s,\s,,s; # turn letters in $_ into numbers tr/JAPH/0123/; # turn every 3 numbers into a single character s;(.)(.)(.);tadman($1,$2,$3);eg; # do some case mapping s;\w+;\L\u$&;g; # show the deciphered message {print} sub tadman { # $1, $2, and $3 from the substitution my ($A, $B, $C) = @_; # be upfront about math my $index = $A * 16 + $B * 4 + $C; # here's the data from whence comes the non-trademarked message my @key = split /|/, 'KnKttiIMRsROhOppllrr y ssLfL cWEdEaauujj'; # here's the uncompressed data, that we can now index into my @expanded_key = map { ord > 0130 ? $_ : (), $_ } @key; # pull out the $index-th letter from @expanded_key my $value = $expanded_key[$index]; return $value; }
blyman
setenv EXINIT 'set noai ts=2'
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: Just Another Unpatented Sequence
by tadman (Prior) on Aug 06, 2002 at 17:31 UTC |