in reply to Code for elegance, code for clarity

Golf has no value in production code. Your "elegant" line is one that I would fail in a code review, and I think that most reasonable people would likewise.

On the home-grown csv_split. I detect the following bugs and various issues:

  1. You are way too fond of putting multiple lines on one. Hitting return is cheap, and makes code easier to scan.
  2. Walking through a string with s/(.)// is inefficient. (Walking the string is O(n*n).) Use m/(.)/g instead - that is what it is for. You can rewrite your check for "" with: /\G"/cg (see perlop). This makes for a O(n) algorithm. (Unless $` or $' were used anywhere in the program. In which case split might turn out faster.)
  3. A toggle is better written as $quoted = !$quoted; That obviously and unambiguously toggles. To verify what $quoted = 1 - $quoted; did I had to to scan the code to see if $quoted was set anywhere else. Why make the maintainer do that work when a clearer alternative exists?
  4. Your code will not handle returns embedded in a quoted field. If having the bug is OK in your application, document that limitation with a comment. Else fix.
  5. Speaking of comments, anyone who doesn't understand the CSV format reasonably well is likely to wonder whether you made some serious mistakes. A brief English description of the format would make sense.
And one overall comment. Coding is about getting the job done efficiently, while leaving yourself able to do so again in the future. It isn't about proving that you are l33t. It isn't about using as few lines as possible. It is about being effective.

Therefore don't think of style as something that you have to squeeze through an external code review. Think of it as something that you enforce with an internal code review because good style lessens how much work you have to do before you can productively work on your old scripts.

Replies are listed 'Best First'.
Re: Re: Code for elegance, code for clarity
by flyingmoose (Priest) on Jan 12, 2004 at 17:13 UTC
    To me, elegant code shows both clarity and inspiration. It can't be a little unorthodox, but the logic must always be transparent.

    I would likewise complain in a code review. Fortunately, I am not subjected to them, as I find the concept fairly insulting. I think sometimes it's better to just say "Here is how we want things to be coded" and then talk to the person if you think something can be done better. At work, I hate having to play defense -- "can't we all just get along?". I hate getting assaulted by groups of people at once, like when three people walk into your tiny cube and start looking over your shoulder. Yep, readability and standards are important -- but they can be taken too far.

    But yes, one or two short line comments would do wonders in maintainablity. Proof of wizardry does nothing for me. I don't think the "elegant" example was elegant at all. I thought it was quite un-elegant. It did the job, it worked, and it wasn't horribly ugly -- but it was not elegant.

    What I want to see is code that is instantly recognizable -- something that doesn't require me to stare at it for one or two minutes to understand. It's ok to require Perl knowledge to understand your code, but I'd rather see more baby-perl rather than golf-perl at work AND on CPAN.

    Though, when reading all of this, don't lose the art -- the soul of programming can be lost through coding standards, "readability", and such. Look at how un-fun it can be to write Java when everyone is complaining about how you must always follow convention X-Y-Z or pattern A-B-C. There is some merit, but there is also too-much-of-a-good-thing. In this case, the near-military enforcement of some questionable policies has made the users of this language into a near parody of themselves. A fine example that "too-readable" and "too-orderly" means "way too much code to read and too much work to do". ex: Thingy.getThingy().getOtherThingy().GetDocument(EndOfRant(),EndOfRantChecker(false,true,true);

    I guess all things can be overdone or underdone. Still though, back to the point, I don't think elegance has been achieved quite yet.

      Hey, code reviews are your friend! They are not out to get to you! Code reviews are needed because sometimes people don't know that there is a nice common idiom for doing something, and because we all have a tendency to cut some corners if no one is looking. Code reviews keep us honest.

      To be a great programmer, one has to learn to put the overall success of the team and the project ahead of one's personal attachment to his own code. I have a great deal of respect for people who can walk into a code review and look at their own code objectively as community property.

        I have a great deal of respect for people who can walk into a code review and look at their own code objectively as community property.
        As do I. Really, I do. Unfortunately, people tend to take it personally sometimes, especially in some languages where code style borders on religion (which is why I speak of Java). I would love to see fair and unbiased code reviews, but you can't do that when the group is not a meritocracy, but rather ruled on seniority and posturing. Having meetings to question code quality can get very ugly unless everyone is on good terms and is in it for the good of the group not the individual. I want the former, but I have been unable to find it (sad, actually).
      Your comments on code reviews reflect one of the biggest problems. How to give people feedback without getting egos involved.

      What you refuse to see, is your worst trap has more on the psychological dynamics which makes this so tricky. Badly given (or badly received) code reviews are counterproductive. Good ones offer value that can't be matched in any other way.

Re: Re: Code for elegance, code for clarity
by delirium (Chaplain) on Jan 12, 2004 at 17:29 UTC
    'delirium' ne 'l33t'

    Tilly, I apologize if you were tempted into more than a cursory glance at my sample code. As I told Podmaster, I wasn't intending to debate the merits of this code. If I wanted a thorough attack for CSV files, I'd go download a CPAN module.

    First, ++ to you for /\G"/cg. One of the reasons I love posting things here is to increase my understanding of the language and learn more tricks such as this one that I had not stumbled across in the documentation before.

    Let me respond to your individual points:

    • Golf has no value in production code.
      One of my main ideas in posting this meditation is that golfing things down to obfuscation is a bad idea in a peer-review environment, but that it is possible that moderately golfing down code into idea nuggets (arbitrary made-up term) may actually aid you in explaining code quickly.
    • Your "elegant" line is one that I would fail in a code review, and I think that most reasonable people would likewise.
      I would hope so.
    • You are way too fond of putting multiple lines on one.
      Idea nuggets. If I can't squeeze it into something like:
      last if !$bananna;
      then maybe I could still fit the whole thought on one line:
       if (!$bananna) { cleanup; last; }
    • Walking through a string with s/(.)// is inefficient. (Walking the string is O(n*n).)
      I've seen that bandied about on Perlmonks a lot. I assume that's a logarithm reference or something.

      Yes, this code will cost plenty of extra CPU cycles. It is based on an inherently flawed approach of having two statements chew down a string for different reasons. I'd never use it in production if resources were scarce or execution speed was important.

    • A toggle is better written as $quoted = !$quoted;
      My Applesoft BASIC days coming back to haunt me again. It took me a long time to get that out of my system. However, if it weren't for that beloved GOTO-laden beast of a language, I wouldn't be a coder today.
    • Your code will not handle returns embedded in a quoted field.
      Thanks, forgot the  /s
      I've seen that bandied about on Perlmonks a lot. I assume that's a logarithm reference or something.

      It ("big O" notation) is used to characterize algorithms, and how they perform in time and space. See An informal introduction to O(N) notation.

      I am telling you that your "idea nuggets" would be easier for me to grasp and understand if you broke them out on separate lines. It is further my considered belief that if you tried it consistently for a period, you would find that your own comprehension was improved by that change. Which is why I suggested it.

      The fact is that our brains find it easier to scan down than at moving left to right. This is why newspaper columns are 40 characters wide. Scanning left to right we have to parse the code to figure out where the divisions are. Scanning down we don't. Furthermore, scanning down and relying on indentation we can skip surprisingly large chunks without reading to find what we are looking for.

      On the other changes, the big-O notation has to do with algorithmic scalability. When you s/(.)// you recopy the whole string. Copying a long string many times gets slow. m/(.)/g doesn't copy the string. This can matter.

      On embedded returns, it is not obvious to me how to fix it. If you are handed one line at a time then you can't without having some way of requesting the next line. Also don't forget that the terminating newline of your row is not supposed to be part of the row of data that you are returning. What you should do with \n depends on whether you are in a quoted field.

      And finally, m/\G"/cg is one that I found out about when I was sent a patch for Text::xSV to get around a regular expression bug. Yet another reason to expose your code for review from others. :-)

      My Applesoft BASIC days coming back to haunt me again.

      I remember it fondly. Basic had a NOT command though, so I would have coded that as $quoted = not $quoted back then.

        So does perl (lower precedence than !, but with only one term on the right that doesn't matter).