One of the things that made me into a Perl enthusiast was the language's ability to express a complex thought on a single line. I've found myself golfing down some algorithms to fewer lines until I end up with odd and arcane looking lines such as:

else { last if !defined (($_ = <>) =~ s!\n|$/!!g); }

...which reads a chunk of ARGV into $_, removes newlines, chomps, and should exit the loop it is a part of at the end of the last ARGV file, or on any chunk not containing a newline or input record separator. Or better yet:

@{$temp_hash{files}} = grep { /@(\d+)$/; $1 > $old } @{$temp_hash{ +files}};

...which is a filter to keep only those elements of an array (that is part of a hash) where the digits following an @ sign next to the end of the line are greater than the scalar $old.

Sometimes lines like that look odd or clumsy. Other times they look elegant and I'm proud of them. They are always difficult to squeeze through a code review, and most times I'll leave their more explicit and easier-to-read cousins in place in an initial release, and show the new code and some tests showing that it works the same as the original at a later time.

It doesn't always seem worth the effort, so I'm hesitant to try to improve my co-workers' "code eyes" (for lack of a better term), and leave the easier to read versions in place, since, after all, the two versions do the same thing. Also, the comments that I litter my production code with tend to express what is going on at a lower level, inflating the size of a simple project until it looks like it was more of a challenge than it really was.

My new challenge, then, is to cultivate the middle ground. Coding for clarity, seeking to stay below the critical mass that clouds over the eyes of my fellows at code review time, while still maintaining as much of my arbitrary elegance as possible. After a few attempts in that direction, I started ending up with code that has an interesting flavor, such as this home-grown CSV splitter:

sub csv_split { local $_ = shift || return undef; my @array = (); my $count = my $quoted = 0; while ( s/(.)// ) { if ($1 eq ',' && !$quoted) { $count++; next; } if ($1 eq q/"/) { unless ( $quoted && s/^\"// ) { $quoted = 1 - $quoted; nex +t; } } $array[$count] .= $1; } @array; }

It's not hard to squeeze something like that through a review. The "while" line, for example, is easy to explain, and so is the fact that that the s/^\"// won't adversely effect $1 since a match means we don't care about $1 any more. The code keeps a nice feel without losing much conciseness or readability, and to boot I'm able to come back to it at a later date and see what my intentions were. In my earlier examples, revisiting the scripts a couple weeks later invokes an initial head scratching reaction, and sometimes a shock of thinking I miscoded the whole thing until I've got it all sorted out again.

If I were to start publishing some modules on CPAN for public consumption, this is the style I think I would stick with. It doesn't look like I'm trying to belittle the intended audience with my aristocratic airs, and it gets the point across pretty cleanly.

Update: Added readmore tag.

Replies are listed 'Best First'.
Re: Code for elegance, code for clarity
by tilly (Archbishop) on Jan 12, 2004 at 16:12 UTC
    Golf has no value in production code. Your "elegant" line is one that I would fail in a code review, and I think that most reasonable people would likewise.

    On the home-grown csv_split. I detect the following bugs and various issues:

    1. You are way too fond of putting multiple lines on one. Hitting return is cheap, and makes code easier to scan.
    2. Walking through a string with s/(.)// is inefficient. (Walking the string is O(n*n).) Use m/(.)/g instead - that is what it is for. You can rewrite your check for "" with: /\G"/cg (see perlop). This makes for a O(n) algorithm. (Unless $` or $' were used anywhere in the program. In which case split might turn out faster.)
    3. A toggle is better written as $quoted = !$quoted; That obviously and unambiguously toggles. To verify what $quoted = 1 - $quoted; did I had to to scan the code to see if $quoted was set anywhere else. Why make the maintainer do that work when a clearer alternative exists?
    4. Your code will not handle returns embedded in a quoted field. If having the bug is OK in your application, document that limitation with a comment. Else fix.
    5. Speaking of comments, anyone who doesn't understand the CSV format reasonably well is likely to wonder whether you made some serious mistakes. A brief English description of the format would make sense.
    And one overall comment. Coding is about getting the job done efficiently, while leaving yourself able to do so again in the future. It isn't about proving that you are l33t. It isn't about using as few lines as possible. It is about being effective.

    Therefore don't think of style as something that you have to squeeze through an external code review. Think of it as something that you enforce with an internal code review because good style lessens how much work you have to do before you can productively work on your old scripts.

      To me, elegant code shows both clarity and inspiration. It can't be a little unorthodox, but the logic must always be transparent.

      I would likewise complain in a code review. Fortunately, I am not subjected to them, as I find the concept fairly insulting. I think sometimes it's better to just say "Here is how we want things to be coded" and then talk to the person if you think something can be done better. At work, I hate having to play defense -- "can't we all just get along?". I hate getting assaulted by groups of people at once, like when three people walk into your tiny cube and start looking over your shoulder. Yep, readability and standards are important -- but they can be taken too far.

      But yes, one or two short line comments would do wonders in maintainablity. Proof of wizardry does nothing for me. I don't think the "elegant" example was elegant at all. I thought it was quite un-elegant. It did the job, it worked, and it wasn't horribly ugly -- but it was not elegant.

      What I want to see is code that is instantly recognizable -- something that doesn't require me to stare at it for one or two minutes to understand. It's ok to require Perl knowledge to understand your code, but I'd rather see more baby-perl rather than golf-perl at work AND on CPAN.

      Though, when reading all of this, don't lose the art -- the soul of programming can be lost through coding standards, "readability", and such. Look at how un-fun it can be to write Java when everyone is complaining about how you must always follow convention X-Y-Z or pattern A-B-C. There is some merit, but there is also too-much-of-a-good-thing. In this case, the near-military enforcement of some questionable policies has made the users of this language into a near parody of themselves. A fine example that "too-readable" and "too-orderly" means "way too much code to read and too much work to do". ex: Thingy.getThingy().getOtherThingy().GetDocument(EndOfRant(),EndOfRantChecker(false,true,true);

      I guess all things can be overdone or underdone. Still though, back to the point, I don't think elegance has been achieved quite yet.

        Hey, code reviews are your friend! They are not out to get to you! Code reviews are needed because sometimes people don't know that there is a nice common idiom for doing something, and because we all have a tendency to cut some corners if no one is looking. Code reviews keep us honest.

        To be a great programmer, one has to learn to put the overall success of the team and the project ahead of one's personal attachment to his own code. I have a great deal of respect for people who can walk into a code review and look at their own code objectively as community property.

        Your comments on code reviews reflect one of the biggest problems. How to give people feedback without getting egos involved.

        What you refuse to see, is your worst trap has more on the psychological dynamics which makes this so tricky. Badly given (or badly received) code reviews are counterproductive. Good ones offer value that can't be matched in any other way.

      'delirium' ne 'l33t'

      Tilly, I apologize if you were tempted into more than a cursory glance at my sample code. As I told Podmaster, I wasn't intending to debate the merits of this code. If I wanted a thorough attack for CSV files, I'd go download a CPAN module.

      First, ++ to you for /\G"/cg. One of the reasons I love posting things here is to increase my understanding of the language and learn more tricks such as this one that I had not stumbled across in the documentation before.

      Let me respond to your individual points:

      • Golf has no value in production code.
        One of my main ideas in posting this meditation is that golfing things down to obfuscation is a bad idea in a peer-review environment, but that it is possible that moderately golfing down code into idea nuggets (arbitrary made-up term) may actually aid you in explaining code quickly.
      • Your "elegant" line is one that I would fail in a code review, and I think that most reasonable people would likewise.
        I would hope so.
      • You are way too fond of putting multiple lines on one.
        Idea nuggets. If I can't squeeze it into something like:
        last if !$bananna;
        then maybe I could still fit the whole thought on one line:
         if (!$bananna) { cleanup; last; }
      • Walking through a string with s/(.)// is inefficient. (Walking the string is O(n*n).)
        I've seen that bandied about on Perlmonks a lot. I assume that's a logarithm reference or something.

        Yes, this code will cost plenty of extra CPU cycles. It is based on an inherently flawed approach of having two statements chew down a string for different reasons. I'd never use it in production if resources were scarce or execution speed was important.

      • A toggle is better written as $quoted = !$quoted;
        My Applesoft BASIC days coming back to haunt me again. It took me a long time to get that out of my system. However, if it weren't for that beloved GOTO-laden beast of a language, I wouldn't be a coder today.
      • Your code will not handle returns embedded in a quoted field.
        Thanks, forgot the  /s
        I've seen that bandied about on Perlmonks a lot. I assume that's a logarithm reference or something.

        It ("big O" notation) is used to characterize algorithms, and how they perform in time and space. See An informal introduction to O(N) notation.

        I am telling you that your "idea nuggets" would be easier for me to grasp and understand if you broke them out on separate lines. It is further my considered belief that if you tried it consistently for a period, you would find that your own comprehension was improved by that change. Which is why I suggested it.

        The fact is that our brains find it easier to scan down than at moving left to right. This is why newspaper columns are 40 characters wide. Scanning left to right we have to parse the code to figure out where the divisions are. Scanning down we don't. Furthermore, scanning down and relying on indentation we can skip surprisingly large chunks without reading to find what we are looking for.

        On the other changes, the big-O notation has to do with algorithmic scalability. When you s/(.)// you recopy the whole string. Copying a long string many times gets slow. m/(.)/g doesn't copy the string. This can matter.

        On embedded returns, it is not obvious to me how to fix it. If you are handed one line at a time then you can't without having some way of requesting the next line. Also don't forget that the terminating newline of your row is not supposed to be part of the row of data that you are returning. What you should do with \n depends on whether you are in a quoted field.

        And finally, m/\G"/cg is one that I found out about when I was sent a patch for Text::xSV to get around a regular expression bug. Yet another reason to expose your code for review from others. :-)

        My Applesoft BASIC days coming back to haunt me again.

        I remember it fondly. Basic had a NOT command though, so I would have coded that as $quoted = not $quoted back then.

Re: Code for elegance, code for clarity
by xenchu (Friar) on Jan 12, 2004 at 14:27 UTC

    Just an opinion, but I would use the simplist, most straight-forward code I could for Production Code. The more understandable the code, the easier it is to maintain.

    Production Code should have two major attributes:

    1. It works.

    2. It is easy to maintain.

    Nobody cares about elegance at three in the morning when trying to fix someone's else's code. Clarity is always good. Make sure your documentation has lots of it.

    xenchu

    update: I really need to learn to type.


    The Needs of the World and my Talents run parallel to infinity.
Re: Code for elegance, code for clarity
by dragonchild (Archbishop) on Jan 12, 2004 at 14:51 UTC
    To me, elegance is where you have encoded as much information per N characters as I can easily read, plus having used features that strike me as appropriate, especially if I wouldn't have done it that way. A little thinking to comprehend the algorithm is acceptable, so long as I don't have to break out pen and paper to rewrite it. (That is obfu, not elegant. Obfus have their own beauty, but I wouldn't call them elegant.)

    Your title implies that elegance and clarity are orthogonal qualities. I would say that clarity is an essential part of elegance. Another integral part of elegance is simplicity. Think about what you would consider elegant in a person. Some items that have classically defined an elegant woman, for example, could be:

    • The line and angle of her neck
    • The shape of her nose
    • The way she moves

    Elegance is about understated simplicity. So, to me, your first snippet isn't elegant. It's short, but I have to break out pen and paper to understand it. However, I would consider your second snippet elegant. It is also short, but it discusses one concept and does it in a simple fashion.

    Your csv_split() ... It has elegant parts. Let me explain ...

    sub csv_split { # This is very elegant. It handles one concept and does so in a si +mple and # clear fashion. # Two bugs: # 1) use "return;", not "return undef;". The latter creates a o +ne element # list in list context. # 2) I would use "shift // return;" (if you have the patch). Ot +herwise, # your function won't handle "0.0" correctly. If you don't h +ave the # patch, I would do a defined-check. local $_ = shift || return undef; my @array = (); # This is not elegant - it's cute and clever. You're using 0 in bo +th its # numeric and boolean aspects. Now, you may not have a choice in t +he matter, # as Perl may not give you the tools to be elegant. (It might ...) my $count = my $quoted = 0; # The while definition is elegant - simple, understated, but clear +. But, the # use of $1 throughout the loop is definitely not elegant. It's ki +nda ugly. # Again, I'm not sure Perl gives you the tools to do the elegant t +hing here. while ( s/(.)// ) { if ($1 eq ',' && !$quoted) { $count++; next; } # The use of q// here is more elegante than '"', but it might +have been # better to provide a variable that hides the ugliness. if ($1 eq q/"/) { # The "$x = 1 - $x" idiom is very elegant. I've always lik +ed it. unless ( $quoted && s/^\"// ) { $quoted = 1 - $quoted; nex +t; } } $array[$count] .= $1; } # Elegance might demand a wantarray solution here. It's unclear. @array; }

    My version (with a bit more of what I consider elegant) would be (untested):

    sub csv_split { local $_ = shift // return; my $is_quoted; my $count = my @array; while ( s/(.)// ) { if ($1 eq ',' and not $is_quoted) { $count++; } elsif ($1 eq q/"/ and not ( $is_quoted and s/^\"// )) { $is_quoted = !$is_quoted; } else { $array[$count] .= $1; } } return if $is_quoted; wantarray ? @array : \@array; }

    I believe my version also better delineates the flow control.

    Update: Added tilly's toggle. I was trying to think of it, but couldn't, so left the x=1-x flip. The toggle is definitely more elegant.

    ------
    We are the carpenters and bricklayers of the Information Age.

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

Re: Code for elegance, code for clarity
by PodMaster (Abbot) on Jan 12, 2004 at 13:50 UTC
    If I were you I wouldn't adopt that style. Don't use next for nothing :) I'd write something like this
    sub csv_split { local $_ = shift || return undef; my @array; my $count = my $quoted = 0; while ( s/(.)// ) { if ( $1 eq ',' and !$quoted ) { $count++; } elsif( $1 eq q/"/ ) { $quoted = 1 - $quoted unless $quoted and s/^\"//; } else { $array[$count] .= $1; } } return @array; }
    Yes, I know I like whitespace (look at that cascading unless).

    I also wouldn't bother to write && except when needed (nothing wrong with and)

    update: also, when all you want is to match, use m//.

    MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
    I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
    ** The third rule of perl club is a statement of fact: pod is sexy.

      Your code is not quite equivalent. Note that if the unless fails, his code will fall through to set $array[$count] while yours won't. I think that's a bug in the OP's code.

      And for the OP, I think that reversing a Boolean is better expressed as $quoted = !$quoted rather than $quoted = 1 - $quoted.

      There's no need to be destructive with s///; all you need is to walk the pattern with //g, and there's no need to special-case the empty quotes. More efficiency can be gained by reading strings of non-quote, non-comma characters. The code becomes very clear, though the regex is a little complicated.

      sub csv_split { local $_ = shift || return undef; my @array = (); my $count = my $quoted = 0; while ( /([^,"]+|[,"])/g ) { if ($1 eq ',' and !$quoted) { $count++; } elsif ($1 eq '"') { $quoted = !$quoted; } else { $array[$count] .= $1 } } @array; }

      The PerlMonk tr/// Advocate
      "next" makes a nice conceptual pocket (yes, I've long since been reduced to making up arbitrary terms). It trims down on the amount of nested conditionals, takes up less space, and makes (if not easier reading) for easier explanation to a third party.

      I'm not intending to debate the merits of the above snippets. Sure, they could be better optimized, use more community-approved idioms, etc. I'm proposing instead that it's OK to code to your audience, and to code with the intention of explaining your algorithms to a group of peers when time is at a premium.

      When left to my own devices without a timetable, my code becomes the nested and over-golfed hoo-hah posted at the top...which I enjoy tremendously.

      By the way, where are you proposing I use m// as opposed to what is already in place?

      It's always interesting to see how different people write code. Except for the weird use of s/// I would have written that routine in much the same way as delirium. It probably has to do with how I "grew up" as a programmer. For instance, being used to C/C++, it's more natural for me to write && than and so I tend to prefer the former over the latter (except, of course, for instances where && has the wrong precedence).

Re: Code for elegance, code for clarity
by ysth (Canon) on Jan 12, 2004 at 17:05 UTC
    This squeeze through a review stuff makes me think you may be coding inappropriately for the ability level of your co-workers.

    Can't resist a little nit-picking:

    else { last if !defined (($_ = <>) =~ s!\n|$/!!g); } ...which reads a chunk of ARGV into $_, removes newlines, chomps, and should exit the loop it is a part of at the end of the last ARGV file, or on any chunk not containing a newline or input record separator.
    That isn't warnings friendly (you'll get "Use of uninitialized value in substitution" at the end of ARGV), and you should \Q the $/.
    @{$temp_hash{files}} = grep { /@(\d+)$/; $1 > $old } @{$temp_hash{files}};
    Also warnings unfriendly if the data doesn't have @ and digits at the end. Better to say: /@(\d+)\z/ && $1 > $old (but use $ instead of \z if you really mean it that way.)
Re: Code for elegance, code for clarity
by mcogan1966 (Monk) on Jan 12, 2004 at 18:17 UTC
    And as I read these posts, I see above "Code for elegance, code for clarity".

    To me, elegant code works correctly, and works efficiently. This is something that I've had to work hard at with my recent project. Taking code that works, and making it code that works faster is how I found my 'elegant' code. But now, I'm leaving it alone since there is little more that can be done without making some of the code unreadable. Someone else may have to look at this down the line, it shouldn't be impossible (just very difficult).

      I think "elegant" is often a term for code that's clear to good enough programmers, and often rather unclear to less good programmers. However, really elegant code is good code to study to improve your own abilities.

      Then again, my degree is in mathematics, where "elegance" is very often not the clearest way to express things.

        On the mathematics line, some physicist... (I forget the name Kinku? First name was Michael I believe), had an interview/show on TechTV about his perceptions of finding a Grand Unified Theory.

        He said routinely rejected theories and equations because they were too complex, every time saying God's universe would be governed by beautiful (&simple*) equations. Though I think String Theory is off in unprovable never-never-land, there was a lot that I learned from that statement.

        * = By simple, I mean short an elegant, without a lot of fudge constants and operators, I'm not saving the Creator would be afraid of a flux integral or something like it :)

Re: Code for elegance, code for clarity
by delirium (Chaplain) on Sep 21, 2019 at 14:47 UTC
    For the record, I have no idea what I was going on about 15 years ago. This whole meditation seems pretty silly.
      Silly? It's fantastic! Look at all the replies and well mentioned advice. Some come off as harsh, but that's because we are imperfect beings struggling to express ourself in a language that is very similar, but not quite Perl (and by that, I mean the English language). I'm writing this while sleepy. Couldn't read your code (well, it is not UNreadable, more like riding in a car on a 15km maximum zone). I guess that is the part of the language what drove people to Python. Please humor me, and answer me this question: How was your growth process over the last 15 years. Did you loose your perl-golf in lieu of readable code (have you become you a well oiled cog in the machine instead of a perl-fu rebel?).

      Personally I've lowered the perl-fu in my code, except for oneliners. There I pull open all registers to keep it short. And sometimes... sometimes I open NYTProfiler and discover that that other construct I don't use nor like is faster... and then well, it takes a while, but I start using it, begrudgingly, until I forget it was not "mine".

      edit: Oh, and perltidy. I code like a madman then use perltidy to get something readable. Madman you say? Yes I mix spaces and tabs... and other things I'm not proud of.