Code for elegance, code for clarity

Replies are listed 'Best First'.

Re: Code for elegance, code for clarity
by tilly (Archbishop) on Jan 12, 2004 at 16:12 UTC

On the home-grown csv_split. I detect the following bugs and various issues:

You are way too fond of putting multiple lines on one. Hitting return is cheap, and makes code easier to scan.
Walking through a string with s/(.)// is inefficient. (Walking the string is O(n*n).) Use m/(.)/g instead - that is what it is for. You can rewrite your check for "" with: /\G"/cg (see perlop). This makes for a O(n) algorithm. (Unless $` or $' were used anywhere in the program. In which case split might turn out faster.)
A toggle is better written as $quoted = !$quoted; That obviously and unambiguously toggles. To verify what $quoted = 1 - $quoted; did I had to to scan the code to see if $quoted was set anywhere else. Why make the maintainer do that work when a clearer alternative exists?
Your code will not handle returns embedded in a quoted field. If having the bug is OK in your application, document that limitation with a comment. Else fix.
Speaking of comments, anyone who doesn't understand the CSV format reasonably well is likely to wonder whether you made some serious mistakes. A brief English description of the format would make sense.

Therefore don't think of style as something that you have to squeeze through an external code review. Think of it as something that you enforce with an internal code review because good style lessens how much work you have to do before you can productively work on your old scripts.

[reply]
[d/l]
[select]

Re: Re: Code for elegance, code for clarity

by flyingmoose (Priest) on Jan 12, 2004 at 17:13 UTC

I would likewise complain in a code review. Fortunately, I am not subjected to them, as I find the concept fairly insulting. I think sometimes it's better to just say "Here is how we want things to be coded" and then talk to the person if you think something can be done better. At work, I hate having to play defense -- "can't we all just get along?". I hate getting assaulted by groups of people at once, like when three people walk into your tiny cube and start looking over your shoulder. Yep, readability and standards are important -- but they can be taken too far.

But yes, one or two short line comments would do wonders in maintainablity. Proof of wizardry does nothing for me. I don't think the "elegant" example was elegant at all. I thought it was quite un-elegant. It did the job, it worked, and it wasn't horribly ugly -- but it was not elegant.

What I want to see is code that is instantly recognizable -- something that doesn't require me to stare at it for one or two minutes to understand. It's ok to require Perl knowledge to understand your code, but I'd rather see more baby-perl rather than golf-perl at work AND on CPAN.

Though, when reading all of this, don't lose the art -- the soul of programming can be lost through coding standards, "readability", and such. Look at how un-fun it can be to write Java when everyone is complaining about how you must always follow convention X-Y-Z or pattern A-B-C. There is some merit, but there is also too-much-of-a-good-thing. In this case, the near-military enforcement of some questionable policies has made the users of this language into a near parody of themselves. A fine example that "too-readable" and "too-orderly" means "way too much code to read and too much work to do". ex: Thingy.getThingy().getOtherThingy().GetDocument(EndOfRant(),EndOfRantChecker(false,true,true);

I guess all things can be overdone or underdone. Still though, back to the point, I don't think elegance has been achieved quite yet.

[reply]

Re: Re: Re: Code for elegance, code for clarity

by perrin (Chancellor) on Jan 12, 2004 at 18:04 UTC

To be a great programmer, one has to learn to put the overall success of the team and the project ahead of one's personal attachment to his own code. I have a great deal of respect for people who can walk into a code review and look at their own code objectively as community property.

[reply]

Re: Re: Re: Re: Code for elegance, code for clarity

by flyingmoose (Priest) on Jan 12, 2004 at 18:25 UTC

Re: ^4 Code for elegance, code for clarity

by mcogan1966 (Monk) on Jan 14, 2004 at 20:46 UTC

Re: Re: Re: Code for elegance, code for clarity

by tilly (Archbishop) on Jan 13, 2004 at 16:40 UTC

What you refuse to see, is your worst trap has more on the psychological dynamics which makes this so tricky. Badly given (or badly received) code reviews are counterproductive. Good ones offer value that can't be matched in any other way.

[reply]

Re: Re: Code for elegance, code for clarity

by delirium (Chaplain) on Jan 12, 2004 at 17:29 UTC

'delirium' ne 'l33t'

Tilly, I apologize if you were tempted into more than a cursory glance at my sample code. As I told Podmaster, I wasn't intending to debate the merits of this code. If I wanted a thorough attack for CSV files, I'd go download a CPAN module.

First, ++ to you for /\G"/cg. One of the reasons I love posting things here is to increase my understanding of the language and learn more tricks such as this one that I had not stumbled across in the documentation before.

Let me respond to your individual points:

Golf has no value in production code.
One of my main ideas in posting this meditation is that golfing things down to obfuscation is a bad idea in a peer-review environment, but that it is possible that moderately golfing down code into idea nuggets (arbitrary made-up term) may actually aid you in explaining code quickly.
Your "elegant" line is one that I would fail in a code review, and I think that most reasonable people would likewise.
I would hope so.
You are way too fond of putting multiple lines on one.
Idea nuggets. If I can't squeeze it into something like:
last if !$bananna;
then maybe I could still fit the whole thought on one line:
if (!$bananna) { cleanup; last; }
Walking through a string with s/(.)// is inefficient. (Walking the string is O(n*n).)
I've seen that bandied about on Perlmonks a lot. I assume that's a logarithm reference or something.
Yes, this code will cost plenty of extra CPU cycles. It is based on an inherently flawed approach of having two statements chew down a string for different reasons. I'd never use it in production if resources were scarce or execution speed was important.
A toggle is better written as $quoted = !$quoted;
My Applesoft BASIC days coming back to haunt me again. It took me a long time to get that out of my system. However, if it weren't for that beloved GOTO-laden beast of a language, I wouldn't be a coder today.
Your code will not handle returns embedded in a quoted field.
Thanks, forgot the /s

[reply]
[d/l]
[select]

Re: Re: Re: Code for elegance, code for clarity

by dws (Chancellor) on Jan 12, 2004 at 18:41 UTC

I've seen that bandied about on Perlmonks a lot. I assume that's a logarithm reference or something.

It ("big O" notation) is used to characterize algorithms, and how they perform in time and space. See An informal introduction to O(N) notation.

[reply]

Re: Re: Re: Code for elegance, code for clarity

by tilly (Archbishop) on Jan 13, 2004 at 16:36 UTC

The fact is that our brains find it easier to scan down than at moving left to right. This is why newspaper columns are 40 characters wide. Scanning left to right we have to parse the code to figure out where the divisions are. Scanning down we don't. Furthermore, scanning down and relying on indentation we can skip surprisingly large chunks without reading to find what we are looking for.

On the other changes, the big-O notation has to do with algorithmic scalability. When you s/(.)// you recopy the whole string. Copying a long string many times gets slow. m/(.)/g doesn't copy the string. This can matter.

On embedded returns, it is not obvious to me how to fix it. If you are handed one line at a time then you can't without having some way of requesting the next line. Also don't forget that the terminating newline of your row is not supposed to be part of the row of data that you are returning. What you should do with \n depends on whether you are in a quoted field.

And finally, m/\G"/cg is one that I found out about when I was sent a patch for Text::xSV to get around a regular expression bug. Yet another reason to expose your code for review from others. :-)

[reply]
[d/l]

Re: Re: Re: Code for elegance, code for clarity

by perrin (Chancellor) on Jan 12, 2004 at 18:34 UTC

My Applesoft BASIC days coming back to haunt me again.

I remember it fondly. Basic had a NOT command though, so I would have coded that as $quoted = not $quoted back then.

[reply]

Re: Re: Re: Re: Code for elegance, code for clarity

by ysth (Canon) on Jan 12, 2004 at 19:52 UTC

Re: Code for elegance, code for clarity
by xenchu (Friar) on Jan 12, 2004 at 14:27 UTC

Just an opinion, but I would use the simplist, most straight-forward code I could for Production Code. The more understandable the code, the easier it is to maintain.

Production Code should have two major attributes:

1. It works.

2. It is easy to maintain.

Nobody cares about elegance at three in the morning when trying to fix someone's else's code. Clarity is always good. Make sure your documentation has lots of it.

xenchu

update: I really need to learn to type.

The Needs of the World and my Talents run parallel to infinity.

[reply]
[d/l]

Re: Code for elegance, code for clarity
by dragonchild (Archbishop) on Jan 12, 2004 at 14:51 UTC

Your title implies that elegance and clarity are orthogonal qualities. I would say that clarity is an essential part of elegance. Another integral part of elegance is simplicity. Think about what you would consider elegant in a person. Some items that have classically defined an elegant woman, for example, could be:

The line and angle of her neck
The shape of her nose
The way she moves

Elegance is about understated simplicity. So, to me, your first snippet isn't elegant. It's short, but I have to break out pen and paper to understand it. However, I would consider your second snippet elegant. It is also short, but it discusses one concept and does it in a simple fashion.

Your csv_split() ... It has elegant parts. Let me explain ...

sub csv_split    {

    # This is very elegant. It handles one concept and does so in a si
+mple and
    # clear fashion.
    # Two bugs:
    #    1) use "return;", not "return undef;". The latter creates a o
+ne element
    #       list in list context.
    #    2) I would use "shift // return;" (if you have the patch). Ot
+herwise,
    #       your function won't handle "0.0" correctly. If you don't h
+ave the
    #       patch, I would do a defined-check.
    local $_ = shift || return undef;

    my @array = ();

    # This is not elegant - it's cute and clever. You're using 0 in bo
+th its
    # numeric and boolean aspects. Now, you may not have a choice in t
+he matter,
    # as Perl may not give you the tools to be elegant. (It might ...)
    my $count = my $quoted = 0;

    # The while definition is elegant - simple, understated, but clear
+. But, the
    # use of $1 throughout the loop is definitely not elegant. It's ki
+nda ugly.
    # Again, I'm not sure Perl gives you the tools to do the elegant t
+hing here.
    while ( s/(.)// )    {
        if ($1 eq ',' && !$quoted) { $count++; next; }

        # The use of q// here is more elegante than '"', but it might 
+have been
        # better to provide a variable that hides the ugliness.
        if ($1 eq q/"/)    {

            # The "$x = 1 - $x" idiom is very elegant. I've always lik
+ed it.
            unless ( $quoted && s/^\"// ) { $quoted = 1 - $quoted; nex
+t; }
        }
        $array[$count] .= $1;
    }

    # Elegance might demand a wantarray solution here. It's unclear.
    @array;
}
[download]

My version (with a bit more of what I consider elegant) would be (untested):

sub csv_split {
    local $_ = shift // return;

    my $is_quoted;
    my $count = my @array;
    while ( s/(.)// ) {
        if ($1 eq ',' and not $is_quoted) {
            $count++;
        } elsif ($1 eq q/"/ and not ( $is_quoted and s/^\"// )) {
            $is_quoted = !$is_quoted;
        } else {
            $array[$count] .= $1;
        }
    }
    return if $is_quoted;

    wantarray ? @array : \@array;
}
[download]

I believe my version also better delineates the flow control.

Update: Added tilly's toggle. I was trying to think of it, but couldn't, so left the x=1-x flip. The toggle is definitely more elegant.

------
We are the carpenters and bricklayers of the Information Age.

Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

[reply]
[d/l]
[select]

Re: Code for elegance, code for clarity
by PodMaster (Abbot) on Jan 12, 2004 at 13:50 UTC

sub csv_split    {
    local $_ = shift || return undef;

    my @array;
    my $count = my $quoted = 0;

    while ( s/(.)// )    {
        if ( $1 eq ',' and !$quoted ) {
             $count++;
        } elsif( $1 eq q/"/ ) {
            $quoted = 1 - $quoted
                unless 
                    $quoted and s/^\"//;
        } else {
            $array[$count] .= $1;
        }
    }

    return @array;
}
[download]

I also wouldn't bother to write && except when needed (nothing wrong with and)

update: also, when all you want is to match, use m//.

MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
** The third rule of perl club is a statement of fact: pod is sexy.

[reply]
[d/l]

Re: Re: Code for elegance, code for clarity

by Roy Johnson (Monsignor) on Jan 12, 2004 at 16:22 UTC

unless

$array[$count]

And for the OP, I think that reversing a Boolean is better expressed as $quoted = !$quoted rather than $quoted = 1 - $quoted.

There's no need to be destructive with s///; all you need is to walk the pattern with //g, and there's no need to special-case the empty quotes. More efficiency can be gained by reading strings of non-quote, non-comma characters. The code becomes very clear, though the regex is a little complicated.

sub csv_split    {
   local $_ = shift || return undef;
   my @array = ();
   my $count = my $quoted = 0;
   while ( /([^,"]+|[,"])/g )    {
       if ($1 eq ',' and !$quoted) { $count++; }
       elsif ($1 eq '"')           { $quoted = !$quoted; }
       else                        { $array[$count] .= $1 }
   }
   @array;
}
[download]

The PerlMonk tr/// Advocate

[reply]
[d/l]
[select]

Re: Re: Code for elegance, code for clarity

by delirium (Chaplain) on Jan 12, 2004 at 14:45 UTC

I'm not intending to debate the merits of the above snippets. Sure, they could be better optimized, use more community-approved idioms, etc. I'm proposing instead that it's OK to code to your audience, and to code with the intention of explaining your algorithms to a group of peers when time is at a premium.

When left to my own devices without a timetable, my code becomes the nested and over-golfed hoo-hah posted at the top...which I enjoy tremendously.

By the way, where are you proposing I use m// as opposed to what is already in place?

[reply]

Re: Re: Code for elegance, code for clarity

by duff (Parson) on Jan 12, 2004 at 14:51 UTC

It's always interesting to see how different people write code. Except for the weird use of s/// I would have written that routine in much the same way as delirium. It probably has to do with how I "grew up" as a programmer. For instance, being used to C/C++, it's more natural for me to write && than and so I tend to prefer the former over the latter (except, of course, for instances where && has the wrong precedence).

duff

[reply]
[d/l]
[select]

Re: Code for elegance, code for clarity
by ysth (Canon) on Jan 12, 2004 at 17:05 UTC

squeeze through a review

Can't resist a little nit-picking:

else { last if !defined (($_ = <>) =~ s!\n|$/!!g); } ...which reads a chunk of ARGV into $_, removes newlines, chomps, and should exit the loop it is a part of at the end of the last ARGV file, or on any chunk not containing a newline or input record separator.

@{$temp_hash{files}} = grep { /@(\d+)$/; $1 > $old } @{$temp_hash{files}};

/@(\d+)\z/ && $1 > $old

[reply]
[d/l]
[select]

Re: Code for elegance, code for clarity
by mcogan1966 (Monk) on Jan 12, 2004 at 18:17 UTC

To me, elegant code works correctly, and works efficiently. This is something that I've had to work hard at with my recent project. Taking code that works, and making it code that works faster is how I found my 'elegant' code. But now, I'm leaving it alone since there is little more that can be done without making some of the code unreadable. Someone else may have to look at this down the line, it shouldn't be impossible (just very difficult).

[reply]

Re: Code for elegance, code for clarity

by dd-b (Pilgrim) on Jan 13, 2004 at 03:30 UTC

I think "elegant" is often a term for code that's clear to good enough programmers, and often rather unclear to less good programmers. However, really elegant code is good code to study to improve your own abilities.

Then again, my degree is in mathematics, where "elegance" is very often not the clearest way to express things.

[reply]

Re: Re: Code for elegance, code for clarity

by flyingmoose (Priest) on Jan 14, 2004 at 00:04 UTC

He said routinely rejected theories and equations because they were too complex, every time saying God's universe would be governed by beautiful (&simple*) equations. Though I think String Theory is off in unprovable never-never-land, there was a lot that I learned from that statement.

* = By simple, I mean short an elegant, without a lot of fudge constants and operators, I'm not saving the Creator would be afraid of a flux integral or something like it :)

[reply]

Re: Re: Re: Code for elegance, code for clarity

by dd-b (Pilgrim) on Jan 14, 2004 at 04:10 UTC

Re: Code for elegance, code for clarity
by delirium (Chaplain) on Sep 21, 2019 at 14:47 UTC

For the record, I have no idea what I was going on about 15 years ago. This whole meditation seems pretty silly.

[reply]

Re^2: Code for elegance, code for clarity

by FreeBeerReekingMonk (Deacon) on Sep 23, 2019 at 23:36 UTC

Personally I've lowered the perl-fu in my code, except for oneliners. There I pull open all registers to keep it short. And sometimes... sometimes I open NYTProfiler and discover that that other construct I don't use nor like is faster... and then well, it takes a while, but I start using it, begrudgingly, until I forget it was not "mine".

edit: Oh, and perltidy. I code like a madman then use perltidy to get something readable. Madman you say? Yes I mix spaces and tabs... and other things I'm not proud of.

[reply]