http://qs1969.pair.com?node_id=1179933

Hello Monks!

I've been learning Perl for some years now. At the same time, moving from writing awk scripts to writing Perl scripts, I have found Perl to be an amazing resource for getting things done.

Still, I have some minor issues with the language design that I have not yet been able to understand/resolve. This is what I want to discuss here.

Background

It sometimes bugs me that it is so difficult to write Perl code that is readable (easy to follow) when working with references. For example, if I see a variable $var in the middle of some code, it can be a scalar variable, a scalar reference, an array reference, a hash reference, and so on. Hence, I often end up guessing or having to scan source code nearby in order to determine the type of the variable. I find this workflow less than optimal. Would it not be better if the variable could (optionally) be made self-documenting with respect to reference type?

In the book Perl Best Practices, the problem is mentioned in another setting, and the solution suggested is to add the suffix _ref to the variable name. So one could write,

$var_href = { a => 1 };
to create a hash ref, and
$var_aref = [ 1, 2, 3];
to create an array reference.

However, a problem with this convention could be that the suffix is not optional. You should not be forced to used the more verbose form of the variable name. I think, the programmer should have a choice to decide whether he finds it advantageous to include the suffix at given place or not. For example, when declaring the variable as

$var = [ 1, 2, 3 ];
it is rather obvious that it is an array reference, and there is no need to write:
$var_aref = [ 1, 2, 3 ];
The latter is in my opinion too verbose. However, if the reference is just defined as
my $var;
it would often be better to include the suffix. If there is no indication on the next lines or so whether $var will be used as an array reference or not, it would be more readable to define it as
my $var_aref;

A new idea for reference variable naming syntax

So this lead me to an idea: Could the postfix dereferencing syntax be extended for this use case?

The Postfix Dereferening Syntax (PDS) was introduced as experimental in 5.20. And starting from 5.24 it is included in the Perl language by default.

Currently PDS is used for dereferencing:

my @array = $var->@*;
Notice that the PDS includes a star after the sigil. It is a syntax error not to include the star. But let's say for the moment that if the star was omitted, the dereferencing was to be simply ignored instead. So
my $var->@;
would mean the same as
my $var;
and produce no syntax error.

Let's denote this new syntax by Optional Postfix Reference Declaration Syntax (OPRDS). So when using OPRDS, should it be entirely up to the user to ensure that he used the correct sigil. For example, if I write

$var->@ = 12;
when I really meant
$var->@ = [ 12 ];
should it produce a compile time error? I think it would be very helpful if the compiler could use OPRDS to check for consistency. But it might be difficult to implement? I do not know. If it is difficult to implement, some alternatives might be used instead? I don't know much of Perl internals, so this is a point where I need help.

When I started out with this idea, compile time type-checking was not on my mind at all. But I see now that OPRDS would offer the opportunity for stricter type checking.

But type checking was not the main issue I wanted to discuss. What I would like to discuss is how to deal with reference variable names. Reading and understanding written Perl code can be difficult since the $ sigil can be used for many data types. How could this situation be improved?

Replies are listed 'Best First'.
Re: Improve readability of Perl code. Naming reference variables.
by stevieb (Canon) on Jan 19, 2017 at 20:24 UTC

    This is one of the many benefits of keeping your code blocks (scope) as small as possible (one screen if possible), as it allows you to visually see what you're dealing with in regards to a variable.

    To further, variable naming is important, as is attempting to write your subroutines so they only do one thing, as opposed to a whole bunch of things. This:

    my $var->@;

    ... is far too much typing just for visual purposes (imho), and it adds unnecessary noise. It's just as easy to scope properly and name things appropriately:

    my $cat;

    That's singular, so I'd put money on the fact it's a scalar (unless it's an object, but if it's an object, it'll be being used much differently than a simple scalar so I digress).

    my $cats; # or my $cat_list;

    ...or:

    my $cat_map;

    Both of those are extremely easy to identify to even a low-intermediate Perl hacker as an array reference in the former case, and a hash ref in the latter (perhaps the author has a hash of cat names to cat colours ;).

    What's more, if there is confusion, in decently laid out code, one may have to scroll up only a tiny bit (if at all) to see where the variable is being declared/defined. If you have to search all over the code for where variables are defined, the scope for that variable is not small enough.

    Even if you get the type wrong, with use strict; and use warnings; will always let you know the what/where of the problem.

    So, in essence, I understand what you're desiring here, but good coding practices alleviate us from (for the most part) needing such visual cues.

      Yeah. I completely agree with you with the idea of keeping subroutines or code blocks small. It is also my experience that this coding style (that you propose) has potential for eliminating most readability issues. So the naming issue can usually be circumvented using proper naming of variables and keeping the scope small. But it's also my experience that in some cases it would still be beneficial to have the option to further document the type of a reference variable.

        What you're referring to in your last sentence is what some call "edge cases". These edge cases, where there may be ambiguity to the reader of the code is where your extremely brief comments should go. Code should document itself, but if you feel the reader may scratch their head:

        ... my $x = thing_list(); # href ...

        Of course, that's a pretty trivial example, but you get the point.

Re: Improve readability of Perl code. Naming reference variables.
by ww (Archbishop) on Jan 19, 2017 at 20:32 UTC

    You wrote: "You should not be forced to used the more verbose form of the variable name."

    Just in case there's a misunderstanding here, NOTHING in PBP is mandatory. Some of the recommendations do, in fact, reflect a concensus among some Perl programmers. Others are held up to criticism as 'one author's preferences.'

    And here's another 'one person's opinion' about your observation that it "should it be entirely up to the user to ensure that he used the correct sigil."
    Frankly, that idea is anathema to me; IMO, it's just another way to write code that you will have trouble deciphering sometime down the road, and that some future maintainer will almost certainly find problematic.

      Yeah, I agree that the suggested reference syntax also could introduce new issues. For example consider function calls:
      func( $var->@ )
      Here it is of course possible that the programmer introduces a typo. First, assume he wrote @ when $var is a scalar (i.e. $var is not a reference). This typo will of course confuse a human reader. But the compiler would probably be quite happy. It would just ignore the optional postfix syntax (OPRDS). Hence, there will be no runtime issues with this typo either. Then consider a different typo. The user types @* when he rather meant to type @:
      func( $var->@* )
      Now, this is a more serious mistake. The compiler will assume that the array reference should be dereferenced. Hence, the function will receive $var->[0] instead of the reference $var, likely to cause some sort of runtime malfunction that may be difficult to debug.
Re: Improve readability of Perl code. Naming reference variables.
by hippo (Bishop) on Jan 19, 2017 at 22:30 UTC
    I think, the programmer should have a choice to decide whether he finds it advantageous to include the suffix at given place or not.

    Since we are dealing with references here then the programmer clearly does have that choice. viz:

    #!/usr/bin/env perl use strict; use warnings; my $var = [1, 3, 5]; my $var_aref = $var; print "var has values: @$var\n"; print "var_aref has values: @$var_aref\n";

    So you, the programmer, can pick and choose which name to use at any point (if you so desire).

      So you would prefer to introduce two reference variables? Sorry, I do not like this idea. I would prefer to keep the number of variables to a minimum. Introducing two variables just for the sake of solving a readability issue seems like a bad idea to me. What if one reference was changed later in the code? Then you must remember to update the other reference at the same place also. This clearly becomes a maintenance problem. And if you forget to update the other variable, there are potential for more confusion.
Re: Improve readability of Perl code. Naming reference variables.
by kcott (Archbishop) on Jan 20, 2017 at 11:36 UTC

    G'day hakonhagland,

    "It sometimes bugs me that it is so difficult to write Perl code that is readable (easy to follow) when working with references."

    About a dozen or so years ago, I supervised a number of junior programmers who also seemed to have this problem. I won't go into details beyond saying this caused no end of problems and time spent on debugging exceeded time spent coding.

    I introduced a local coding standard, that required a prefix on all variable names whose values were references. There may have been some special cases, but these were the main ones:

    • $rs_name : scalarref
    • $ra_name : arrayref
    • $rh_name : hashref
    • $rc_name : coderef
    • $rg_name : globref
    • $ro_name : object reference

    The concept was simple and mostly fixed the problem. Using the wrong operation on a variable was usually easy to identify (e.g. $ra_name->{...}, $rs_name->method(...), $rh_name->(...), and so on). Subsequent reading of the code, for maintenance or debugging, was made easier.

    I should also point out another policy that the 'name' part had to be meaningful and, as far as possible, self-documenting. This typically meant that, if a wrong letter (identifying the reference type) was used, it would be picked up by strict (whose use was also mandatory).

    While this was fine for that situation and environment, it doesn't really suit my personal style and I no longer use it: I much prefer to use the smallest possible, lexical scopes where these sort of problems generally don't occur. However, if you think this, or something like it, will help to improve your coding, perhaps give it a try and see if it works for you.

    "Let's denote this new syntax by Optional Postfix Reference Declaration Syntax (OPRDS)."

    I didn't like this idea at all. With postfix dereferencing, $var remains the variable and ->@* is an operation on that variable. With your OPRDS, $var->@ seems to be a separate variable and operation (in your OP); subsequently, in one of your responses, you use func( $var->@ ), where $var->@ now apparently represents the entire variable. You also seemed to get confused with "func( $var->@* ) ... the function will receive $var->[0] ...": in fact, the function will receive @$var.

    You may have had typo(s) in that response, but I found myself scrolling back and forth to understand what was going on: the very problem you're attempting to avoid: "... scan source code nearby in order to determine ...".

    — Ken

      >
      $rs_name : scalarref $ra_name : arrayref $rh_name : hashref $rc_name : coderef $rg_name : globref $ro_name : object reference

      I'm using something very similar but without the redundant r in front, e.g $c_block .

      Now I'm wondering why you use them... :)

      Cheers Rolf
      (addicted to the Perl Programming Language and ☆☆☆☆ :)
      Je suis Charlie!

      Hello kcott!

      Interesting to hear about your experience teaching students. I am sure the style you introduced might indeed help improve the situation you described. But once you have declared a variable with a prefix, it is no longer optional to remove the prefix. This is why I don't like the idea of a prefix that is part of the variable name. A prefix as a part of the sigil would seem like a better idea. Then it could be made optional.

      For example, consider a function called with three references. A scalar reference, a hash reference, and a string reference;

      sub func { my ( $rs_str, $hr_desktop_info, $ha_files ) = @_; $$rs_string = update_string_ref(); for ( keys %$hr_desktop_info ) { ... push @$ha_files, $file; } .... }
      I seems to me like the prefixes will introduce too much noise in the source code. In this case, it might be better if only the first line in the function documented the type of the reference, and then subsequent lines could omit the variable name prefix:
      sub func { my ( $rs_str, $hr_desktop_info, $ha_files ) = @_; $$str = update_string_ref(); for ( keys %$desktop_info ) { ... push @$files, $file; } .... }
      Of course, the above code is not yet possible. And further it could not easily be made part of Perl in the future. But maybe a new type of prefix could be used, for example $>$, $>%, and $>@ ?
      sub func { my ( $>$str, $>%desktop_info, $>@files ) = @_; .... }
      On the other hand, I can see the clash here with the Perl special variable $> (The effective uid of this process). So this syntax might be difficult to implement.

      Regarding the last point of your reply. Yes, I agree that if I call func( $var->@* ), the function will indeed receive @$var. But I assumed a function definition on the form

      sub func { my ( $var ) = @_; ... }
      Now, the function would "receive" $var->[0] ( in the sense that $var in the function will be equal $var->[0] of the caller). But I think this (minor) issue of whether the function receives the whole array or only its first item is just a distraction from the main topic of the discussion. So I will not go further into the issue.

        sub func { my ( $rs_str, $hr_desktop_info, $ha_files ) = @_; $$str = update_string_ref(); for ( keys %$desktop_info ) { ... push @$files, $file; } .... }

        I rather feel that $$str, %$desktop_info and @$files make it pretty clear, not only that your dealing with references, but also what type of references they are.

        If you're having problems reading that, I suggest you do what ++stevieb has already alluded to and put the prefixes in a comment. Something like:

        my ($str, $desktop_info, $files) = @_; # rs, rh, ra

        Update (minor typo fix): s{deck}{desk} in ..., %$decktop_info and ....

        — Ken

Re: Improve readability of Perl code. Naming reference variables. [New Perl Feature]
by kcott (Archbishop) on Jan 22, 2017 at 22:33 UTC

    There's a new feature that appeared in "developer release 5.25.3". It's mentioned in "(5.25.3) perldelta: Declaring a reference to a variable". Details are in "(5.25.3) perlref: Declaring a Reference to a Variable". The text of that last link starts with "Beginning in v5.26.0, ...": so, if you don't want to install a developer version (the latest is 5.25.9), you probably won't have long to wait for a stable one.

    I think this is the sort of thing you were after:

    #!/usr/bin/env perl use 5.025003; use strict; use warnings; no warnings 'experimental::refaliasing'; use experimental qw{refaliasing declared_refs}; use feature 'declared_refs'; { my $str = \'string'; my $list = [qw{a b c}]; my $map = {x => 24, y => 25, z => 26}; func($str, $list, $map); } sub func { my (\$string, \@array, \%hash) = @_; say $string; say "@array"; say "$_ => $hash{$_}" for sort keys %hash; return; }

    Sample run:

    $ perl -v | head -2 | tail -1 This is perl 5, version 25, subversion 9 (v5.25.9) built for darwin-th +read-multi-2level $ pm_1179933_test_exp_declared_refs.pl string a b c x => 24 y => 25 z => 26

    Important: Do note that this feature is experimental; subject to change; and, as such, not suitable for production code.

    — Ken

Re: Improve readability of Perl code. Naming reference variables.
by johngg (Canon) on Jan 20, 2017 at 10:45 UTC

    As someone who prefers camelCase I use a more succinct convention.

    my $rsSomeValue = \ do { my $val = 42 }; # scalar ref my $raCats = [ qw{ Tiddles Desmo Felix } ] # array ref my $rhAges = { John => 23, Bill => 35 }; # hash ref my $rcDoIt = sub { return $_[ 0 ] * 3 }; # code ref my $rxPat = qr{abc}; # regexp ref my $roObj = Some::Pkg->new() # object ref

    If I see something matching m{\$r[sahcxo][A-Z]} I know I'm dealing with a reference.

    Cheers,

    JohnGG

      Hello johngg.

      Yes, it is shorter, but is it more readable? This also extends to the discussion of whether to use snake_case or camelCase. In my opinion camelCase is more succint (easier to type), whereas snake_case is more readable (but more difficult to type).

      I also once used camelCase, so I can understand your choice. My main objection though is that the prefix syntax (that you propose) is not optional. See also comment to kcott for more information.

      Camel case makes my skin crawl. Hate it. Give me underscores or give me death.

      $PM = "Perl Monk's";
      $MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate";
      $nysus = $PM . ' ' . $MCF;
      Click here if you love Perl Monks

        I guess it's all a bit subjective. I don't dislike underscores and can happily use them if mandated but camelCase keeps long descriptive variable names shorter which is why I prefer it.

        Cheers,

        JohnGG

Re: Improve readability of Perl code. Naming reference variables.
by stevieb (Canon) on Jan 22, 2017 at 00:43 UTC

    With all that was said here on this thread, I want to say that while learning, knowing what is what is definitely important. Knowing this thread is so relevant still, I thought I'd share an example of where I'm learning something (converting Perl variables to C), that indeed, it's useful to be able to identify your vars so one understands what's happening. See this thread. That naming convention won't last but a day, but it can be useful in code while still trying to grasp what's going on.