Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Of variable {mice} and its name {man}.

by vladb (Vicar)
on Jun 01, 2002 at 14:17 UTC ( [id://170911] : perlmeditation . print w/replies, xml ) Need Help??

In recent days many of our meditations around the monastery concerned some deeply involved issues such as this, or this, or that other thing over there. These and many others are certainly quite healthy discussions and deserve the credit they in fact already got. However, there has been little attention given to the subtle intricacies of performing daily Perl hacking routines. In this thread, I would like to draw your attention and, of course, pursue a constructive consultation on the sometimes overlooked and at the first glance trivial subject of variable naming.

Certainly, many of you would shout and even point me in the direction of one of these exceptional threads:

And that’s about all we have that is at least remotely related to this subject.

. . . vladb shuffles his old ‘Code Complete’ book a couple pages . . .

Ahh, here I found it, quote:

“You can’t give a variable a name the way you give a dog a name – because it’s cute or it has good sound. Unlike the dog and its name, which are different entities, a variable and a variable’s name are essentially the same thing. When it comes to variable naming conventions, a couple obvious rules spring to mind:
  • Fear short variables, unless used once and close to where they are declared (such as in a for loop).

    This is way beyond evil:
    my (@a, @b); # . . . # initialize variables # . . . my @m = map { my $x = $_; grep { /^$x$/ } @b; } @a; print @m;
    What are these variables there? Yes, yes, the @a and @b variables? And what would I like to know is exactly @m for? These questions are cleared by simply renaming the variables:
    my (@registered_users, @personalized_users); # . . . # initialize variables # . . . my @advanced_users = map { my $registered_user = $_; grep { /^$registered_user$/ } @personalized_users; } @registered_users;
    By simply naming the variables right, we have also implicitly added some good piece of in-line documentation. Observing the code, it becomes crystal clear that its purpose is to gather all users that are both personalized and registered (say, with a portal). These users are classified as being ‘advanced’ in that they have not only registered with the portal, but also personalized certain aspects of it.

  • A variable must exist to serve only one purpose.

    I had a couple scripts where I would declare a multi-purpose variable such as $temp or $buffer and use them for various different purposes throughout the code. This only led to a number of bugs that was hard to locate and fix. Therefore, it’s best to avoid using ‘generic’ variables. It is best to be specific in variable naming. A variable name should stand for what the variable’s purpose in the code is. Of course, it’s hard to come up with a suitable name when a variable’s purpose in life hasn’t been clearly defined ;-).

  • Stick to a single naming convention agreed upon by your fellow developers.

    This is very much self-explanatory. Being on the same page in terms of naming conventions serves good when it comes your time to expropriate, debug, or review someone else’s code (say, another developer in your company).

There’s certainly more to this list, but let me direct your attention to at least a few naming conventions specific to Perl. In addition to the ones mentioned above, throughout my short career as a Perl hacker, I have observed other developers follow this guidelines:
  • Use meaningful prefixes/suffixes to signify the data type of a variable.

    Unlike other languages, in Perl every variable that is other than a scalar reference has one of these prefixes: @, $, %. For example,
    my @employees; # declare an array. my %records; # declare a hash variable. my $name; # declare a scalar
    However, things become a little bit more confusing when it comes to scalar references in Perl. Similarly to the way it is being done in other languages such as C and Pascal, a number of Perl hackers choose to use suffixes that signify that type of a structure that a scalar reference is pointing to. Here’s a sample,
    # reference to an array my @employees; my $employees_aref = \@employees; # reference to a hash my %records; my $records_href = \%records. # reference to a scalar my $huge_text; my $huge_text_sref = \$huge_text; # This naming is popular in subroutine parameters. sub calc_employee_salary { my ($employee_aref) = @_; # $employee_aref is a reference to an array # holding employee IDs (or names, whichever) # since this is an array reference, use ‘@’ to get to the # actual array. foreach (@$employee_aref) { # . . . } }
  • Private class subroutines should be prefixed with ‘_’ (underscore).

    A quote from perltoot explains it all:

    Perl doesn't impose restrictions on who gets to use which methods. The public-versus-private distinction is by convention, not syntax. (Well, unless you use the Alias module described below in Data Members as Variables.) Occasionally you'll see method names beginning or ending with an underscore or two. This marking is a convention indicating that the methods are private to that class alone and sometimes to its closest acquaintances, its immediate subclasses. But this distinction is not enforced by Perl itself. It's up to the programmer to behave.

There are certainly hundreds more examples of indecent variable naming. I hereby challenge you to step forward and share your wisdom! ;-)

$"=q;grep;;$,=q"grep";for(`find . -name ".saves*~"`){s;$/;;;/(.*-(\d+) +-.*)$/; $_=<a HREF="/"> "," $2 </a>;`@$_ +`?{print"+ $1"}:{print"- $1"}&&`rm $1`; print$\;}

Replies are listed 'Best First'.
(jeffa) Re: Of variable {mice} and its name {man}.
by jeffa (Bishop) on Jun 01, 2002 at 14:40 UTC
    I will agree, variable naming is very important, but it really depends upon scope. If i am writing a one-liner, i'll use the fewest characters possible, such as $f, @f, or %f. If i am writing a throw away script, i don't feel quilty for using foo, bar, baz, and qux here and there. If it is a script that i plan for others to use, i sit and think very carefully about the names. If i am working with a group, i will do my best to stick to whatever standards they have agreed upon.

    Coming from Java, i used to name my variables like employeeRecord and thingWithNoName, but now i stick to the convention of using all lower case and separating words with the underscore - thing_with_no_name. This also keeps me from chosing long names since i hate reaching for the underscore key. ;)

    When confronted with mental block on finding a good name, if i can't think of one in 5 minutes or less, i will pick something and move on. Leaving a comment that promises to rename the variable reminds me of why i did so. I really feel that names are important, but not so important as to get stuck in rut over them. This is similar to listening to someone's conversation when they pause for far too long trying to find the right word, completely losing their train of thought.

    Final comment - plurals. tye finally convinced me to avoid using plurals for arrays and hashes, because they already indicate a colletion: @file, @line, %record. This keeps me from guessing between $file and @file, $record and %record. This is also a new practice for me, so i sometimes forget.


    Consistency is golden
      jeffa, thanks for your reply ;-).

      I'm not strongly positive on whether to use plurals or singulars for arrays or hashes. I had to struggle with that one a couple times and never seem to be able to find a common ground. In the back of my mind, I too feel that avoiding using plurals in similar instances is the sane thing to do.

      "When confronted with mental block on finding a good name..."

      Of course, variable naming shouldn't be taken to extreme (as also pointed out by Zaxo from the CB conversation ;-). More often then not, I find myself in a similar position where I am not able to come up with a suitable name in a couple minutes. Often, I simply take the best knowledgable guess and proceed. Optionally, leaving a comment just the way you do.

      $"=q;grep;;$,=q"grep";for(`find . -name ".saves*~"`){s;$/;;;/(.*-(\d+) +-.*)$/; $_=["ps -e -o pid | "," $2 | "," -v "," "];`@$_`?{print"+ $1"}:{print" +- $1"}&&`rm $1`; print$\;}
Re: Of variable {mice} and its name {man}.
by hsmyers (Canon) on Jun 01, 2002 at 22:13 UTC

    It is interesting to note that the discussion over 'how to name variables' predates even the 'structured programming revolution' (hmmm--come to think of it so do I<sigh>) And if there has been a single point of agreement over all of this time then it would be:

    No matter what they are used for, there names should be meaningful!
    Now there is a lofty goal if I ever saw one. 'Goal' because for all of our good intentions, there are many reasons why we miss this particular ideal. My own personal pet exception is an entire set; [i,j,k,m,n,s,t]. I use them as follows:
    • i,j,k are all reserved for indexes, typically nested.
    • m,n similar, but usually in some numeric context (too many days at the blackboard doing proofs I'd guess).
    • s and t reserved for string parameters
    Anything else tends to be meaningful--I swear!

    Now if you want a real debate, consider this:

    It right and just to seperate words in a name with an underbar.
    Defend or attack--you have five minutes...


    "Never try to teach a pig to sing…it wastes your time and it annoys the pig."
      hsmyers, do you have a FORTRAN background by any chance?

      For the benefit of others on this forum, FORTRAN is like Basic and no strict, in that you do not have to declare variables. However, variables beginning [I-M] are automatically declared integer, whereas ones beginning with the other letters of the alphabet are automatically real.

      This I believe is where your i,j,k and m,n come from. As for s and t, strings were a latecomer to FORTRAN.

      Back to the subject of perl, my beliefs are that it should be obvious from looking at the code what your variable is. I don't have an issue with single char loop variables (apart from $a and $b of course), but generally, variables should be given meaningful names. Also, I like using underscores, as I have done much programming in languages and on operating systems without case sensitivity. I have been bitten may times writing code with variables $fooBar, $FooBar and $foobar, whereas $foo_bar wins every time for me.

        My first experiences with i,j,and k were in Vector Calculus, not in programming. In a three dimensional space, i relates to the X, j relates to Y, and k relates to Z. I don't know if this was just my textbook's standard, or if others followed this as well.
Re: Of variable {mice} and its name {man}.
by FoxtrotUniform (Prior) on Jun 02, 2002 at 20:07 UTC

    Instead of an admonition ("Short variable names are bad! Unless...") I'd suggest a more generous rule:

    The verbosity of a name should be proportional to its scope

    If my program's a one-liner, then $i is fine for pretty much any purpose. (Although using $i as a real number would probably weird me out.) In a longer program, $i is probably too short for anything but a loop variable, or maybe an increment in a short function.

    In general, I try to use the shortest name possible that doesn't need an explanatory comment. So, if I'm working with vertices:

    for my $v (@verts) { ... } ... sub centroid { my @verts = @_; ... } ... package Graphics::Model; ... @Graphics::Model::vertices = ();

    I guess my point is that it's not the length of the name that matters (gnarf gnarf), but whether you need to comment it.

    The hell with paco, vote for Erudil!
    /msg me if you downvote this node, please.

Re: Of variable {mice} and its name {man}.
by Dog and Pony (Priest) on Jun 02, 2002 at 21:41 UTC
    When I read this, I remembered something I think is from Fowler's Refactoring (I don't own the book, so I can't look it up):

    One of the advices for what and when to refactor was that any time you feel the need to comment a block of code, to explain what it does, it was time to extract that piece of code into a subroutine and give it a good name. That way it would be obvious what happened at that place by reading the name of the sub, and the need for the comment disappear.

    That also would apply here - as soon - or as long - as you (would) need a comment to explain what a variable is for, you should rename it to describe what it holds. This is essentially what you say above, of course - just thought it would be a (perhaps) good "rule of thumb" to keep in mind.

    This still means that using for instance $i as an iterator variable in a for loop probably doesn't need any comment, and thus it doesn't need to be renamed either.

    As many others have said, apart from one-liners and the like, I do try to name my variables in a readable fashion - matter of factly, my variable names are often quite verbose even in test scripts, because the extra typing does save time and pain even during the short time the test script lives. Otherwise I tend to resort to $_ and friends for Q&D coding. :) Noticable execptions include for instance my habit to name the CGI object $q, but that is kind of a convention, so I don't feel too bad about it. I do name the different Parser objects (like HTML::TokeParser) $parser etc, instead of simply $p.

    You have moved into a dark place.
    It is pitch black. You are likely to be eaten by a grue.
      I am fully in agreement with jeffa's and Foxtrot's handy rule of thumb, that the importance of a variable's name (hence its informative detail), should relate directly to its scope in the code, and its exposure to programmers other than the code author. With that in mind, I would warn against what seems to be a logical inference suggested by these statements: would be obvious what happened at that place by reading the name of the sub, and the need for the comment disappear.
      That also would apply here - as soon - or as long - as you (would) need a comment to explain what a variable is for, you should rename it to describe what it holds.

      One should not interpret this to mean that a detailed variable name is sufficient to document the role of an important variable -- especially when the important variables tend to be heavy data structures (HoH, AoH, etc).

      When an important variable is declared, of course its name should be meaningful, but there should also be some commentary to explain things that the name alone cannot convey: how it's structured, how it gets values assigned to it (is it filled from input? computed?), and/or what its values are used for.

      For that matter, given the choice of "long variable name" vs. "short name with a descriptive comment on the initial declaration", I'll go for the latter; effective laziness in programming means, in this instance, doing something once (documenting a variable's role) and doing it well that one time, rather than doing it repeatedly (encapsulating "documentation" in the variable's name), but not doing it properly at any point.

        I almost have to assume that you misinterpret this on purpose, possibly to point out that I wasn't very exact in my post. In any case:

        You are confusing "how comments" and "what comments". In the example taken from Refactoring, you don't seriously believe that the name of the new sub should describe how the code works? No, it describe what it does, and what it is for. That should be pretty obvious.

        Then for most code, especially in small blocks with good names, it should be quite clear what is happening from just reading the code - some people even advocate that if the code is well-written, it should not need any comments whatsoever. That I can't really agree on, there are always hairy parts that needs extra explaining. And as someone once said: "Good code has lots of comments, bad code needs lots of comments." - meaning that you should always comment your code, no matter which of them you write... especially since who are you to tell? :)

        You think that it is somehow mutually exclusive with comments and verbose names. Why? Where did I say that you can't explain your data structure because you have a good name? The name is there so you can see what data is being worked on in the code, not so you can see how. That either is easy enough to read from the surrounding code, or is documented somewhere where it is easy to find.

        Again, how contra what.

        Documenting the variables name once like thus:

        # Holds all employees names my @em;
        instead of naming it something like:
        my @employee_name;
        is your choice then? Well, it is not mine. If I have a semisized piece of code, then I don't want to have to go to the top of the program (potentially) to find out what each variable holds, each time I encounter them. If I know what data is in it, from the name, then I can work on. If I don't know the data structure, and it isn't possible to (easily) deduce from the surrounding code, then I'll go look for some docs on the structure. What or how.

        Perl programmers often pride themselves on "Laziness", since it is one of the programmer's virtues according to Larry Wall. Well, there is false laziness too. Trying to "save time" by typing a few less keys is definetely false laziness - at least if you are trying to hold that as a virtue. As the XP and refactoring people say about writing tests - you have the extra time for it, because you didn't really code all, or even most of the time that you sat at your computer anyways. As with tests, you can save a lot of maintenance and bug fixing by being clear about what happens. Even if apt naming isn't gonna do a big difference on its own, together the small things count. Beware of false laziness.

        You have moved into a dark place.
        It is pitch black. You are likely to be eaten by a grue.
Re: Of variable {mice} and its name {man}.
by tonyday (Scribe) on Jun 03, 2002 at 04:12 UTC
    One common exception to the no short name rule is for the bleeding obvious - especially for an object. For example,
    my $ua = LWP::UserAgent->new();
    seems more perlish than
    my $user_agent_for_general_usage = LWP::UserAgent->new();
    A convention that I picked up solely through liberal cut-n-paste of pod examples.