RobertJ has asked for the wisdom of the Perl Monks concerning the following question:

I'm using a subroutine I found as part of a program to convert a file with thousands of IP ranges to the minimum number of CIDR blocks. I have tested it and can't find a situation where it doesn't work.

The problems is I would like to understand HOW IT WORKS. Here is the declaration, calling statement and the subroutine:

my (@chunks, @fbits, @lbits); ..... ..... do_chunk(\@chunks, \@fbits, \@lbits); .... .... sub do_chunk { my ($chunks, $fbits, $lbits) = @_; my (@prefix, $idx1, $idx2, $size); # Find common prefix. After that, next bit is 0 for $fbits and 1 for # $lbits is 1. A split a this point guarantees the longest suffix. $idx1 = 0; $idx1++ while ($idx1 <= $#$fbits and $$fbits[$idx1] eq $$lbits[$idx1]); @prefix = @$fbits[0 .. $idx1 - 1]; $idx2 = $#$fbits; $idx2-- while ($idx2 >= $idx1 and $$fbits[$idx2] eq '0' and $$lbits[$idx2] eq +'1'); # Split if $fbits and $lbits disagree on the length of the chunk. if ($idx2 >= $idx1) { $size = $#$fbits - $idx1; do_chunk ($chunks, $fbits, [ @prefix, (split //, '0' . '1' x $size) ]) +; do_chunk ($chunks, [ @prefix, (split //, '1' . '0' x $size) ], $lbits) +; } else { $size = $#$fbits - $idx2; push @$chunks, [ (bits2num( [ @prefix, (split //, '0' x $size) ])), @$ +fbits - $size ]; } }

Now for my questions:

  1. What does the "\" do in "\@" in the calling arguments?
  2. In the subroutine I'm confused by the use of $#$, $$ and @$
  3. Last (for now) since this subroutine recursively calls itself, how will the second call shown below ever get executed?
if ($idx2 >= $idx1) { $size = $#$fbits - $idx1; do_chunk ($chunks, $fbits, [ @prefix, (split //, '0' . '1' x $size) ]) +; do_chunk ($chunks, [ @prefix, (split //, '1' . '0' x $size) ], $lbits) +; }

Thanks in advance

Replies are listed 'Best First'.
Re: Total Syntax Confusion Plus
by ikegami (Patriarch) on Apr 20, 2009 at 17:51 UTC

      "1. It returns a reference to the following variable". I read the reference and understand; now I need to think of what it means in the program.

      "2. They operate on the array referenced by the scalar". Again read your reference and have concluded I need to read more about the use of references.

      "3. After the first call returns". In thinking about this I believe what is happening is like "nesting". I have been going through this using a single IP range and printing to try to understand. After reading more on the use of references will continue that effort.

      Thanks, will likely be back

        now I need to think of what it means in the program.

        You can't pass arrays to a sub. The only thing that can be passed to a sub is a list of scalars. The solution is to pass references to arrays, references being scalars.

        In thinking about this I believe what is happening is like "nesting".

        I didn't look at how recursion is used here, but it's usually used to divide and conquer. Let's take merge sort, for example.

        Sorting

        +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | | | | | | | | | | | | | | | | | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
        is done by sorting
        +---+---+---+---+---+---+---+---+-------------------------------+ | | | | | | | | | | +---+---+---+---+---+---+---+---+-------------------------------+
        and
        +-------------------------------+---+---+---+---+---+---+---+---+ | | | | | | | | | | +-------------------------------+---+---+---+---+---+---+---+---+

        independently then merging the results. How does it sort the two halves? Well by breaking each down into two halves and sorting those. Etc.

Re: Total Syntax Confusion Plus
by pemungkah (Priest) on Apr 21, 2009 at 04:24 UTC
    Hi Robert! Welcome to references in Perl. References are, to put it simply, sort of like typed pointers. You have to dereference them (that is, ask for what the reference is pointing to) using the same type - that is, a hash reference must be dereferenced as a hash, an array reference as an array, and so on.

    The '\' operator is Perl's way of saying "please give me a reference to the thing following". So

    \@foo
    is an expression whose value is a reference pointing to the array @foo.

    References are scalars, so an expression like this

    my_sub(\@alpha, \@beta, \@gamma);
    passes three different arrays to my_sub. If you left off the "\", you'd get one big array of all the elements, which is not what the algorithm wants. Notice how @_ is broken up into 3 scalars? Those are the three references to the arrays coming in.

    Second, the funny-looking sigils are actually dereferences of the references we passed in. Remember, what we got was references, which point at something; they can't be used themselves. You dereference a scalar by putting the appropriate sigil in front of it. So if we have the following:

    my $hashref = \%myhash; my $arrayref = \@myarray; my $scalarref = \$myscalar;
    we could dereference each of these by putting the appropriate sigil in front:
    my %newhash = %$hashref; my @newarray = @$hashref; my $newscalar = $$scalarref;
    The usage $#$foo says "Please treat whatever is in $foo as an array reference, and do a $# on that array." (This particular usage is really weird; I've actually never seen it used before!) You've seen $$ (please give me the scalar this reference points to) and @$ (please give me the array this reference points to).

    It's important to remember that a reference to an array, when dereferenced, actually means "remember @bar back in the main program? Well, I want you to do these operations on that particular array." So @$chunks really means "I want @chunks back in the main program". (The names don't have to be the same; the programmer who wrote this just decided to write it that way. More confusing, in my book.)

    Last, remember that when you call a subroutine, you have to return from it, so each of those calls will (eventually) return: the bottoming-out test is

    if ($idx2 >= $idx1)
    When this happens, the code does this
    $size = $#$fbits - $idx2; push @$chunks, [ (bits2num( [ @prefix, (split //, '0' x $size) ])), @$ +fbits - $size ];
    and then returns by falling out of the bottom of the subroutine.

    Since we have a bottoming-out criterion, then the two calls to do_chunk mean "please carry out the do_chunk on this set of data, and when that's eventually done, run it a second time on this other data, then fall out the bottom to return (again)".

    A couple of other notes:

    This is not the world's best clearest code. Because the code insists on doing all the work in the original arrays, it's a lot more complicated and uses a lot more reference expressions. If it were me, I'd try to find a way to do it by capturing the arrays in local copies, manipulating those, and then returning values (probably references to arrays inside the code we just returned from - yes, you can do that; if you call a sub, take a reference to a my variable inside it, and then return that reference, Perl figures that variable was important to you and keeps it around until either the variable holding the reference in the caller gets set to some other value, or the variable holding the reference goes out of scope).

    You might find it interesting to run this code from a little test program under the debugger, breaking at the beginning of do_chunk, then doing x @_ to see how the arrays change on each call.

      pemungkah,

      Thanks for your thoughts. I needed to do a mind transformation to think in terms of references; and, once I did that it was much more clear. My mind tends to think in terms of "static typed variables" if that is a valid way to put it. By that I mean that the variable name "dumm" is always what you initially declare it as.

      In addition I don't use Perl very much and remembering all the little syntax variations requires me to thumb through O'Reily's Programming Perl; all 1,000 plus pages. One that always bites me is using ";" in for statements to separate expressions rather than ",".

      I agree on the "This is not the world's best clearest code." comment. Before I found this subroutine I was going to write my own. It was written by a mathematics professor at the Institute of Mathematics of the Romanian Academy about seven years ago; it worked and being a follow the path of least resistance kind of person I used it.

      With regard to running some test data in a debugger I did that, sort of. However, rather than a debugger I used a number of print statements formatted in a way that put everything on one page of paper so I could pick up a pencil and trace what was happening.

      Again thanks for your comments. I know it took some effort to respond in the detail manner you did. I follow and provide help on the "MozillaZine" forum and understand the effort.

      Cheers