Maintainable code is the best code Part II

In Part I, I discussed the meta-coding aspects to maintainable code - how to layout and structure your code so that you get the most out of it. In this part, I want to discuss some of the Perlisms that can help you improve your code.

There are hundreds of situations where the exact same logic can be written in a number of ways. This is TMTOWTDI, and one of the foundations of Perl. However, there're usually worse ways and better ways. This is to discuss some of the better ways, when it comes to maintainability. Now, some of these may be worse, performance-wise. However, I'm going to say that, when it comes to a normal application, "fast enough" is a very important concept. If you need more speed, add RAM. If you need more speed, add more CPU. If you need even more speed, then optimize your code. Your time coding is the most expensive part of an application. Remember that. Now, onto some examples.

Let's say that you have a $typeNum, which is one of 1, 2, or 3. There's also a $typeName for each $typeNum, and you need to be able to convert between the two. We have a function that does it. But, what should the function do?

sub getTypeName {
    my $typeNum = shift;

    if ($typeNum == 1)
    {
        return "A NAME";
    }
    elsif ($typeNum == 2)
    {
        return "B NAME";
    }
    elsif ($typeNum == 3)
    {
        return "C NAME";
    }
    else
    {
        die "$typeNum not valid\n";
    }
}
[download]

That's the most obvious and brute force method. And, I must say, it works perfectly fine in this instance. However, what if there are 26 choices? The canonical answer is to create a hash or array, with keys of $typeNum and values of $typeName. And, that works perfectly fine. But, what if all your values map perfectly well like this?

sub getTypeName {
    my $typeNum = shift;

    die "$typeNum not valid\n"
        unless 1 <= $typeNum && $typeNum <= 3;

    my $typeName = ('A' .. 'C')[$typeNum - 1] . " NAME";

    return $typeName;
}
[download]

Now, what if you have a string you want to break up. If it's nicely delimited, you can use split very nicely. But, what if it's positional? The obvious answer is to use substr a number of times, sorta like this

my $first = substr $line, 0, 2;
my $second = substr $line, 2, 4;
my $third = substr $line 10, 4;
[download]

Again, this doesn't scale very well beyond, say, three items. Even then, it's ugly. So, the first thing most people do is turn to unpack. That would look something like

my ($first, $second, $junk, $third) = unpack "A2A4A4A4", $line;
[download]

That $junk in there isn't very aesthetically pleasing. So, how about we do something like

my ($first, $second, $third) = (unpack "A2A4A4A4", $line)[0,1,3];

# Or, you could do ...

my ($first, $second, $third) = ($line =~ /(.{2})(.{4})(.{4})(.{4})/)[0
+,1,3];
[download]

This is using the concepts of lists and slices. unpack returns a list. Instead of assigning that list immediately, you can index into that list, either as a straight access or a slice.

Now, I'm still very unhappy about the fact that we have three variables. We could easily assign it to a list and call it @data, but I don't like having to use numeric indices. I'd much rather use a hash. Maybe something like

my @colNames = qw(first second third);
my %hash;
@hash{@colNames) = (unpack "A2A4A4A4", $line)[0,1,3];
[download]

Now, that's more like it!

Easy (and readable!) parsing of a fixed-length record
Easily modifiable code (that can even be gotten from a configuration file)
Taking only what I want to take, thus not cluttering up the assignment
Assigning it to a easy-to-read-and-use hash

And, there are a number of other applications for these techniques. Have fun!

Update: davorg's comment about the template definitions in unpack is right on the money. I'm not very familiar with it, primarily because I never use pack or unpack for parsing fixed-length records. I will usually do a split //, $line and work with the resultant array, either with splice, shift, or foreach. The idea was to demonstrate that slicing is a useful direction to go in many cases.

------
We are the carpenters and bricklayers of the Information Age.

Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

Comment on Maintainable code is the best code Part II Select or Download Code

Replies are listed 'Best First'.
Re: Maintainable code is the best code Part II by runrig (Abbot) on Oct 02, 2001 at 23:55 UTC
Now, what if you have a string you want to break up. If it's nicely delimited, you can use split very nicely. But, what if it's positional? If you have more than ten or twenty fields (whose format I usually cut 'n paste from some document), I find the explicit use of unpack unmanagable (e.g. adding/deleting, or changing the length of a field in the middle of the format is a pain, e.g. when there's a typo in the doc or the doc includes 'sub-fields' which are not obvious), and I like to use Parse::FixedLength, which for your example would be: `my $parser = Parse::FixedLength->new([qw( first:2 second:4 filler:4 third:4 )]); my $href = $parser->parse($line); print "$href->{first}\n"; etc.` [download]	[reply] [d/l]
$skin{cat}++ by boo_radley (Parson) on Oct 03, 2001 at 00:35 UTC
I acknowledge this was not the point of your node. but although you can make an array slice out of unpack's results, you can also use a LHS undef for padding in these situations : `my $line="partnum partdesc color flavor "; my ($part, $desc, undef, $flava)= unpack "A10A10A10A10", $line; print "--\n$part\n$desc\n$flava\n--";` [download] which I find easy to parse.	[reply] [d/l]
Re: Maintainable code is the best code Part II by davorg (Chancellor) on Oct 03, 2001 at 12:41 UTC
Personally I wouldn't use the array slice on the return from `unpack`, I'd use `x` in the template definition to ignore the data completely. `@hash{@colNames) = unpack 'A2A4x4A4', $line;` [download] -- <http://www.dave.org.uk> "The first rule of Perl club is you don't talk about Perl club."	[reply] [d/l]
Re: Maintainable code is the best code Part II by AidanLee (Chaplain) on Oct 03, 2001 at 16:16 UTC
Something to note is that `unpack()` is not always appropriate for handling character strings. `pack()` and `unpack()` deal with bytes. With the advent of Unicode (no, not everyone uses it yet) characters != bytes. Note that in the `pack()` documentation, 'A' indicates an ASCII string, which is guaranteed to be 1 byte per character.	[reply] [d/l] [select]
Re (tilly) 1: Maintainable code is the best code Part II by tilly (Archbishop) on Oct 04, 2001 at 16:43 UTC
A note. Often some of the biggest maintainability questions can be solved by asking a question you hadn't thought of. For instance you are asking about the best code for converting between a type number and a corresponding name in Perl. I look at that and immediately ask, "Why are you using a numerical code in Perl?" My answer to that question would be, "Unless forced by external circumstances, I wouldn't." Numerical codes are inherently less maintainable to program with than string descriptions. For a type code, perhaps string descriptions which go through a function that catches any typos. And, of course, if I don't have numerical codes floating around, I no longer need to think about the most maintainable way to write the conversion code...	[reply]
Re: Re (tilly) 1: Maintainable code is the best code Part II by dragonchild (Archbishop) on Oct 04, 2001 at 18:43 UTC
And, frankly, I wouldn't use numerical codes to describe stuff. However, say I'm given a set of things in an array and I need to name them according to their position in the array, plus some fixed string. This type of conversion between number and name is very useful. And, this is something I and my coworkers are doing a lot in our current project. These two meditations were prompted by my helping a coworker of mine learn good programming practices, both in general and in Perl. So, I used the examples I used with her, hoping to get at some generic ideas. ------ We are the carpenters and bricklayers of the Information Age. Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.	[reply]