deprecated has asked for the wisdom of the Perl Monks concerning the following question:

Assume for a moment we have a large directory full of files. These files are roughly 2000 lines long. We want to gather data about the contents of these files. It is not necessary to associate data with filenames, just data with line numbers. So, we construct a hash with line numbers as keys. It looks like this:
my %statistics = ( 1 => [ "foo", "bar" ], 2 => [ "baz", "bletch" ], 3 => [ "quux" ], );
But, how do you decide when to add an arrayref or a scalar? I was thinking there should be a add-in-array-context operator for this, and I even vaguely recall that this was RFC'd, using  :=. Now, granted, I hate Pascal, so I'm not keen on using that operator. I also recall that one of Larry's rules in Apocalypse 1 was "Larry Gets The Colon." So the add-in-array-context idea was shot down (i think). So instead of having my nifty operator, I have this less-than-slick code:
# if we havent hit this line yet, add an array # reference to %statistics at this line number. if (not defined $statistics{$line_number} ) { $statistics{$line_number} = [ $thisline ]; } # if we have hit this line, then we have an # array reference already and all we have to do # is push it. if (defined $statistics{$line_number} ) { push @{ $statistics{$line_number} }, $thisline; }
Which seems to be a) semi-fragile and b) inelegant. Note: sometimes we will hit a piece of the file that says "we dont need to know any more about this." So we stop parsing at that point. We also dont know how long (or short) these files are going to be. So we cant very well just populate the hash with array refs (which would be an ugly kludgeful way to do this, imho). Is there a smarter way to do this?

brother dep

--
Laziness, Impatience, Hubris, and Generosity.

Replies are listed 'Best First'.
Re: Larry Gets The Colon. (selective addition of an arrayref to a hash) (code)
by chipmunk (Parson) on Jun 01, 2001 at 18:26 UTC
    References auto-vivify in Perl; if you have a variable with an undefined value, and you treat it as a reference, Perl creates the reference for you. Thus, the push should be sufficient for all cases: push @{ $statistics{$line_number} }, $thisline; Although, if you want to be more explicit, you can create the array ref yourself:
    $statistics{$line_number} ||= []; push @{ $statistics{$line_number} }, $thisline;
    There's also an example of this in perlreftut (offsite), the tutorial on references.
Re: Larry Gets The Colon. (selective addition of an arrayref to a hash) (code)
by bwana147 (Pilgrim) on Jun 01, 2001 at 18:27 UTC

    I'd do this:

    $statistics{$line_number} ||= []; push @{ $statistics{$line_number} }, $thisline;

    If $statistics{$line_number} is already an array ref, it is considered true, so the ||= operator won't affect it. Otherwise, it becomes an empty array ref.

    Whatever happens on the first line, you know on the second line that $statistics{$line_number} is an array ref, so you can push without further pondering.

    HTH

    --bwana147

Re: Larry Gets The Colon. (selective addition of an arrayref to a hash) (code)
by bikeNomad (Priest) on Jun 01, 2001 at 18:30 UTC
    One minor improvement would be to not repeat the test, by returning from the sub or continuing the loop if you just added it:
    # if we haven't hit this line yet, add an array # reference to %statistics at this line number. if (not defined $statistics{$line_number} ) { $statistics{$line_number} = [ $thisline ]; next; # or return or whatever to avoid next push } # now we know that we have an array ref # and all we have to do is push it. push @{ $statistics{$line_number} }, $thisline;
      or, heaven help us, if ... else construction.
Re: Larry Gets The Colon. (selective addition of an arrayref to a hash) (code)
by ChemBoy (Priest) on Jun 01, 2001 at 19:13 UTC
    It is not necessary to associate data with filenames, just data with line numbers. So, we construct a hash with line numbers as keys.

    I'm sure there's a reason, but it's not apparent to me... why use a hash with numbers as keys instead of an array of array references?



    If God had meant us to fly, he would *never* have given us the railroads.
        --Michael Flanders

      I guess it's because not every line is interesting. Suppose you want to find the line numbers where a token is found in a collection of files, but don't care about comments, why would you want to keep array elements which are uninteresting? You'd need to filter them out later anyway, with something like "grep { @$_ } @list". Using "keys %list" is altogether easier.

      Without the railroads, how would we ge to the airport ;-)