These two pieces of code do the same thing. Can you see what they do before reading the rest of this?
% perl -E'say for&{sub{"\U\x{fb01}\x{fb03}"=~/.{0,2}.{0,3}.{0,3}.{0,4} ++(?{$_[++$#_]=rand})(*FAIL)/||pop;@_}}' % perl -E'say for(*100=sub{$_[0]?(rand,(*{$_[0]-1}=*{$_[0]})->($_[0]-1 +)):()})->(100)'
If you want, run them so you can see what they do. Don’t worry: they’re perfectly harmless. Try perltidy on them: go ahead, I dare ya. :) And for the first, I would also suggest adding -Mre=debug. That should make it more obvious. Heck, might as well run them under the debugger, just in case.

Good luck!


Spoilers Below

So, howja do?

Both are fine illustrations of the importance of careful formatting and whitespace — and yes, sometimes even comments — to make the intentions clear.

First program, elaborated

Regular expressions are especially amenable to this bea(u)tification through the /x modifier. Expanding the first of those two programs above, we have this one:
use 5.010; say for &{ sub { "\U\x{fb01}\x{fb03}" =~ m((?mix-poop) #include <stdlib.h> #include <unistd.h> #include <regex.h> #include "perl.h" #include "utf8.h" #ifndef BROKEN_UNICODE_CHARCLASS_MAPPINGS .{0,2} .{0,3} .{0,3} .{0,4} #define rand() (random()<<UTF_ACCUMULATION_SHIFT^random()&UTF_CONTINUA +TION_MASK) +(?{ $_ [++$#_] = rand() || rand() || UTF8_TWO_BYTE_LO (*PERL_ +UNICODE) #else (*PRUNE +) #define FAIL (*ACCEP +T) }) (*FAIL) #endif (*COMMI +T) )poop || pop @{ (*_{ARR +AY}) } ;#; @{ (*SKIP: +REGEX) } @{ (*_{ARR +AY}) } } }
Clearer? Well, maybe not.

Second program, elaborated

Now let’s try to embellish the second one, the one without the regex. In fact, I’ll use it in a program with comments telling you what it’s doing:
#!/usr/bin/perl -l @vi = 6->(6); # six random numbers $dozen = 12; @dozen = $dozen->($dozen); # 12 random numbers @baker's_dozen = &$dozen(++$dozen); # 13 random numbers print for 100->(100); # prints 100 random numbers! @_ = 100; print for &0; # ... and so does this! BEGIN { # sure'd be a lot harder to understand w/o whitespace :) (*100 = sub { $_[0] ? ( rand, ( *{ $_[0] - 1 } = *{ $_[0] } )->( $_[0 +] - 1 ) ) : ( ) } )->( 100 ); }
There, now you know how it works, right?


Summary

Which one do you like better, and why?

Replies are listed 'Best First'.
Re: Unearthed Arcana
by BrowserUk (Patriarch) on May 13, 2011 at 08:32 UTC

    The only really arcane bit is that glob syntax allows you to bypass the sub naming rules.

Re: Unearthed Arcana
by JavaFan (Canon) on May 13, 2011 at 08:13 UTC
    The second one creates a sub *100 that returns a random number, and creates *0 .. *99 that all call *(N+1), resulting in subs *0 .. *100 returning N random numbers, with N the argument, and 0 <= N <= 100.
Re: Unearthed Arcana
by JavaFan (Canon) on May 13, 2011 at 09:34 UTC
    The first program generates a random number for each way it tries to match /.{0,2}.{0,3}.{0,3}.{0,4}+/ against "uc fiffi" eq "FIFFI", before giving up on the backtracking.

    It tries 101 times, populating @_ in each round. With the final pop, that results in 100 random numbers.

    So, do you have any challenging puzzles? ;-)

      Completely off-topic, your post demonstrates the profound stupidity of Unicode ligatures. Ligatures are a typographic trick to make certain sequences of letters like "fi" and "ffi" look pretty when displayed in some media. Comically, the Unicode ligatures not only make life a royal pain for regular expression matching, but they're also ugly as sin (compare the actual "fi" to the "fi"-ligature here). They're even less useful than pages of emoji.
        Completely off-topic, your post demonstrates the profound stupidity of Unicode ligatures. Ligatures are a typographic trick to make certain sequences of letters like "fi" and "ffi" look pretty when displayed in some media. Comically, the Unicode ligatures not only make life a royal pain for regular expression matching, but they're also ugly as sin (compare the actual "fi" to the "fi"-ligature here). They're even less useful than pages of emoji.
        The reason Unicode has those particular ligatures is to preserve the originals when doing round‐trip conversions with legacy encodings that allowed such things to be specified with distinct, individual codes. In modern typesetting, such matters should be — and are — taken care of automatically.

        ¡Fontalicious!

        On the matter of being ugly as sin, here is my emoji example where I actually use fi ligatures three times, just because that was a posting where I was being extreme in the font games. If you look closely at that example, they do look marginally better there than the unkerned alternatives, although not so much that you would normally even notice them. Which is just as it should be.

        It certainly isn’t “ugly as sin”; it looks fine. Of course, if you’re using some brutish sans serif font as your default display and that font hasn’t made allowances for these legacy ligatures, so that you have to resort to some fallback font‐substitution glyph, then well that’s the price you pay for brutishness.   😜

        On the other hand, in this sample in Adobe Caslon Pro, I use no ligatures at all; all that is figured out for me by the font itself. For a somewhat subtler effect, here’s that sample again, this time in Adobe Garamond Pro. But for real sophistication, there’s just nothing like that same sample rendered in Zapfino.

        All three of those samples are fine examples of good kerning rules that don’t make the user say how and what and where things are tied together — that is, ligated. (Hey, did you know that that ligar con alguien is Spanish slang for “to hook up”, as in “to get laid”?) It all magically falls out of the OpenType rules built into each respective font.

        NFKD($s) =~ /⋯/i

        Now, regarding the regex matter. The legacy ligatures are actually doing people a service here, because they make it obvious that you cannot just do blind searches on unnormalized Unicode text. Regexes make no allowances for things like default ignorables, diacritic‐insensitive comparisons, decompositions, or collation‐strength equivalences. And you need all those things.

        Now, it just so happens that Unicode does have case folds for the legacy ligatures, although these are the one‐to‐many full case folds that next to nobody but Perl even tries to handle. That means this works:

         % perl -E 'say "E\x{FB03}ciency" =~ /^effi/i || 0'
        1
        
        However, because we don’t allow incomplete matches stranding part of a code point, this doesn’t:
        
        % perl -E 'say "E\x{FB03}ciency"'
        Efficiency
         % perl -E 'say "E\x{FB03}ciency" =~ /^eff/i || 0'
        0
        
        That shows why you really want a compatibility decomposition for text searching:
        
         % perl -MUnicode::Normalize -E 'say NFKD("E\x{FB03}ciency") =~ /^effi/i || 0'
        1
        
         % perl -E 'say "3:15 \x{33D8}"'
        3:15 ㏘
         % perl -MUnicode::Normalize -E 'say NFKD("3:15 \x{33D8}") =~ /\bP\.?M\b/i || 0'
        1
        I’ll address collation‐strength equivalence, including but not limited to diacritic‐insensitive matching, some other day.
Re: Unearthed Arcana (intentions)
by tye (Sage) on May 13, 2011 at 19:40 UTC
    Both are fine illustrations of the importance of careful formatting and whitespace — and yes, sometimes even comments — to make the intentions clear.

    I don't see anything even close to "clear intentions". Sure, the first chunks of code were rather hard to discern the intentions of. But a lack of whitespace made the parsing of the code only slightly more difficult. The blown-up examples of code actually did very little to make any intentions clear to me. There seems to be a bunch of added "text" especially in one case that I find mostly contributes confusion.

    I think you've managed to instead demonstrate that whitespace, formatting, and comments are often not worth spit in the face of bizarre code. You've just reinforced my belief that writing clear code is much more important than any of whitespace, formatting, or comments... probably quite counter to your intentions for the above node.

    - tye        

      Both are fine illustrations of the importance of careful formatting and whitespace — and yes, sometimes even comments — to make the intentions clear.
      I don't see anything even close to "clear intentions".
      Ya think? :) Watch their hands, not their lips.
      I think you've managed to instead demonstrate that whitespace, formatting, and comments are often not worth spit in the face of bizarre code. You've just reinforced my belief that writing clear code is much more important than any of whitespace, formatting, or comments... probably quite counter to your intentions for the above node.
      No, you were right the first time. It’s the old Rob Pike thing about how comments don’t do one bit to turn confusing code into clear code. In fact, they can even make it worse. Not a single comment was in any way explanatory. In the first program, the comments are of course there only to daze and confuse. In the second, the comment is there for ironic effect. As you discovered with my first supercited line I opened this missive with, I don’t always lace my words with smirking emojic guideposts: that doesn’t mean they don’t apply. If you can’t laugh without a laugh track, how funny is it, really?

      And the comments in the first program are not as far from reality as you might think. I’d just written a program in a state of mild pique that very well could have done something like that. See, I was torqued off at perl -P being robbed from us with nothing but a big fat gaping             left in the documentation in its stead.

      You can take my cpp when you pry it out of my cold, dead fingers

      So I wrote a program that did this:
      #define exec(arg) BEGIN { exec("cpp $0 | $^X") } # nyah nyah nyah-NYAH nyah!! #undef exec #define CPP(FN, ARG) printf(" %6s %s => %s\n", main::short("FN"), q(AR +G), FN(ARG)) #define QS(ARG) CPP(main::qual_string, ARG) #define QG(ARG) CPP(main::qual_glob, ARG) #define NL say ""
      Which worked just fine. Here’s that whole program: Go ahead, just try writing that one without cpp or any fancy source filters: ENOFUN!

      Simple things should be simple, dang nabbit!


      The Undiscovered Namespace

      I was also having fun calling numerically named functions, and in other versions of the code I had numerically named arrays with things like:
      @12 = 12->(12);
      This was all prompted by a mistake in chromatic’s Modern Perl. It erroneously claims that my @3; is an invalid Perl identifier. That’s of course not true.

      First of all, my is not an identifier; @3 is. And it is a perfectly valid Perl identifier, as evidenced by:

      
      % perl -Mstrict -E '@4 = (4) x 4;  say "@4"'
      4 4 4 4
      
      As you see, you can strict it till you choke, but there it remains, perfectly pleased with itself.

      What my @3 is, is an invalid declaration of a perfectly healthy Perl identifier. Other sorts of declarations with it work just fine. Here’s a lexically scoped alias:

      
      % perl -Mstrict -E 'our @4 = (4) x 4;  say "@4"'
      4 4 4 4
      
      And here’s a dynamically scoped value:
      
      % perl -Mstrict -E 'local @4 = (4) x 4;  say "@4"'
      4 4 4 4
      
      Whereas here’s a — um, something else:
      
      % perl -Mstrict -E 'local our @4 = (4) x 4;  say "@4"'
      4 4 4 4
      
      But don’t expect a package to protect you. @4 is an über‐global:
      
      % perl -Mstrict -E 'say @4 = __PACKAGE__; { package Innumerable; @4 = __PACKAGE__ }  say "@4"'
      main
      Innumerable
      Without even resorting to hyperbole, Perl has billions and billions of these exquisite über‐globals. You could write all your programs just using them, and no strictures will ever wine at you.
      
      % perl -Mstrict -E 'say %3 = (1..4); say $3{3}'
      1234
      4
      With functions, all you have to do is name them in a somewhat circuitous fashion:
      
      % perl -Mstrict -E '*4 = sub { say "\Ufor@_" }; &4'
      FOR
      % perl -Mstrict -E '*4 = sub { say "\Ufor@_" }; &4(get=>)'
      FORGET
      % perl -Mstrict -E '*4 = sub { say "\Ufor@_" }; 4->(ever::)'
      FOREVER
      You will notice that I even get to call the function using a symbolic dereference, despite no strict "refs" being in force — if you can call that “force”.

      If you’re wondering why this exists, it’s of course an artifact of the way the numbered variables, $1 &c &c, work. But it also leaves the door open so that we can someday make this work:

      
      "800-555-1212" =~ /(\d+-?)+/;
      say "numbers were: ", join " and ", @1;
      numbers were: 800- and 555- and 1212
      And yeah, this will be hard on the people who write programs using only numbered variables and subroutines, but tough noogies.

      It’s one thing to present a simplified version of reality, but you can only bend the truth so far before it breaks. Not only is @3 a perfectly legal Perl identifier, there are a whole lot more where that came from.

      To say otherwise is — well, let’s just say it’s too chary of the truth for my conscience.

        It’s one thing to present a simplified version of reality, but you can only bend the truth so far before it breaks.

        I go as so far as to say that that line in my book is a deliberate fib. By the time readers know enough Perl 5 to know why what I wrote isn't true in that specific case, they should know enough to know why it isn't true—and, hopefully, why I fibbed without a footnote.

        See. True to type.

        Mastery of the useful obscure, is...well...useful. Only occasionally, but still useful.

        Mastery of obscurity for its own sake--how is sub 1{ ... } any more useful than sub one{ .. };?--is naught more than an puerile attempt at one-upmanship. No attempt to teach or inform, nor even to sportingly challenge.

        Simply to say: I know; you don't. Pure egotism.

        Ah. So your post had intentions nearly as unclear as your code. I guess that's... something.

        - tye        

Re: Unearthed Arcana
by LanX (Saint) on May 13, 2011 at 08:30 UTC
    you might wanna use <spoiler></spoiler> tags. :)

    Cheers Rolf

Re: Unearthed Arcana
by lancer (Scribe) on May 17, 2011 at 11:26 UTC
    Sometimes I really can't see the point of obfuscating code...

    Ok, regex-es could be a valid exception. There's only one language for describing regex expressions, as far as I know, and it's quite a rigid and dense language. Maybe a more readable language could be created for regex-es too, but right now it doesn't exist. So I excuse regex-es from being unreadable.

    But otherwise, I think the best writing style for code is when I can glance at a page of it and see what it does, and move on to the next page.