Code clarification - use of map and $$

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Code clarification - use of map and $$_ by Corion (Patriarch) on Aug 09, 2016 at 13:47 UTC
In your case, it's not `$$_`, but `$$_[...]`. The `$$...` is dereferencing a reference. `$$_[...]` can be rewritten as `$_->[ ... ]`, which might make the indexing of an array more obvious to you. See also References Quick Reference.	[reply] [d/l] [select]
Re^2: Code clarification - use of map and $$_ by Anonymous Monk on Aug 09, 2016 at 17:25 UTC
Collective huge thank you to everybody who contributed to this thread! Much to learn and read. Again: thanks!	[reply]
Re^2: Code clarification - use of map and $$_ by Anonymous Monk on Aug 09, 2016 at 14:05 UTC
Thanks for that. Why exactly would dereferencing be used here? Why not direct access to the variable?	[reply]
Re^3: Code clarification - use of map and $$_ by davido (Cardinal) on Aug 09, 2016 at 14:56 UTC
Because the list returned by `@{$r{$k}}` is a list of array references. On each iteration of the `map` loop one element is passed in from the array, `@{$r{$k}}`, to `$_`. That element is a reference to an array. Thus, to act upon its contents, you dereference it. Dave	[reply] [d/l] [select]
Re^3: Code clarification - use of map and $$_ by LanX (Saint) on Aug 09, 2016 at 14:55 UTC
> Why exactly would dereferencing be used here? Why not direct access to the variable? without digging too deep into this code ... `map` can only iterate over scalars. I.e. like a list of $array_refs, if you want to address different arrays ... > Why not direct access to the variable? if you mean something like `@array` as the "direct" variable, you CAN'T do something like `map { $_[0]++ } (@a,@b,@c)` to increment the first element of each array. The real problem with that code is the laziness of the author to use a clear style. Cheers Rolf _{(addicted to the Perl Programming Language and ☆☆☆☆ :) Je suis Charlie!}	[reply] [d/l] [select]
Re^3: Code clarification - use of map and $$_ by Corion (Patriarch) on Aug 09, 2016 at 14:06 UTC
I don't know. Maybe ask the original author of the script.	[reply]
Re: Code clarification - use of map and $$_ by AnomalousMonk (Archbishop) on Aug 09, 2016 at 15:01 UTC
`while (<IN>) { our(@F) = split(/\s+/, $_, 0); push @{$r{join ' ' x 8, @F[0..3]};}, [@F[4, 6]]; sub END { foreach $k (keys %r) { my($x, $y); map {$x += $$_[0]; $y += $$_[1];} @{$r{$k};}; my @g = split(/\s+/,$k); print OUT "$g[0]\t@g[1]\t@g[2]\t@g[3]\t", $x / scalar(@{$r +{$k};}), "\t$y\n"; } } }` [download] Another odd thing to note about this code is the `END` block planted in the middle of it, written in a disparaged way as a sub block. Please see the "BEGIN, UNITCHECK, CHECK, INIT and END" section in perlmod. Because all `END` blocks run at the end (!) of all other code, I think this chunk of code could more clearly and conventionally be written as: `while (<IN>) { our(@F) = split(/\s+/, $_, 0); push @{$r{join ' ' x 8, @F[0..3]};}, [@F[4, 6]]; } ... all other code ... END { foreach $k (keys %r) { my($x, $y); map {$x += $$_[0]; $y += $$_[1];} @{$r{$k};}; my @g = split(/\s+/,$k); print OUT "$g[0]\t@g[1]\t@g[2]\t@g[3]\t", $x / scalar(@{$r{$k} +;}), "\t$y\n"; } }` [download] Good luck. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^2: Code clarification - use of map and $$_ by pryrt (Abbot) on Aug 09, 2016 at 15:16 UTC
++AnomalousMonk: that helped (me, anyway; don't know about the OP). That was my first guess, but my initial experiments with the OP code didn't mesh. But I later saw that the print went to OUT, and forgot to redo a test to STDOUT. Thus I started thinking that it was a sub named `END` in a bout of horrible style. But I think you're right, it's an END block. However, with this simplified code, I still only see the FIRST instance of the END block doing anything: `use strict; use warnings; $, = ","; $\ = "\n"; $" = ";"; my %r = ( 0 => 0, 1 => 0, 2 => 0, 3 => 0 ); foreach (1..10) { my $x = $_ % 4; ++$r{$x}; sub END { print __LINE__, "sub END block", $x, $r{$x} }; } foreach(1 .. 10) { my $x = $_ % 4; print "$_ => $x => $r{$x}"}; print __LINE__, "END OF SCRIPT"; __END__ __OUTPUT__ 1 => 1 => 3 2 => 2 => 3 3 => 3 => 2 4 => 0 => 2 5 => 1 => 3 6 => 2 => 3 7 => 3 => 2 8 => 0 => 2 9 => 1 => 3 10 => 2 => 3 17,END OF SCRIPT 13,sub END block,1,3` [download]	[reply] [d/l] [select]
Re^3: Code clarification - use of map and $$_ by AnomalousMonk (Archbishop) on Aug 09, 2016 at 15:28 UTC
... I still only see the FIRST instance of the END block doing anything ... From this I think you've already gotten the point, but anyway... There is only one instance of any given `END` block in a program: `c:\@Work\Perl\monks>perl -wMstrict -le "for my $str (qw(one two three)) { print qq{in for loop: '$str'}; END { print 'END block ONE'; } END { print 'END block TWO'; } END { print 'END block THREE'; } } " in for loop: 'one' in for loop: 'two' in for loop: 'three' END block THREE END block TWO END block ONE` [download] Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^3: Code clarification - use of map and $$_ by pryrt (Abbot) on Aug 09, 2016 at 15:21 UTC
Oops, re-reading AnomalousMonk's post and the OP, I was reminded that there was a loop inside the END block, and that does what was intended. use strict; use warnings; $, = ","; $\ = "\n"; $" = ";"; my %r = ( 0 => 0, 1 => 0, 2 => 0, 3 => 0 ); foreach (1..10) { my $x = $_ % 4; ++$r{$x}; sub END { $, = " => "; foreach my $k ( keys %r ) { print __LINE__, "sub END block with k", $k, $r{$k} } }; } foreach(1 .. 10) { my $x = $_ % 4; print "$_ => $x => $r{$x}"}; print __LINE__, "END OF SCRIPT"; __END__ __OUTPUT__ 1 => 1 => 3 2 => 2 => 3 3 => 3 => 2 4 => 0 => 2 5 => 1 => 3 6 => 2 => 3 7 => 3 => 2 8 => 0 => 2 9 => 1 => 3 10 => 2 => 3 22,END OF SCRIPT 16 => sub END block with k => 0 => 2 16 => sub END block with k => 1 => 3 16 => sub END block with k => 3 => 2 16 => sub END block with k => 2 => 3 [download]	[reply] [d/l]
Re: Code clarification - use of map and $$_ by pryrt (Abbot) on Aug 09, 2016 at 15:00 UTC
That's got some strange notation -- lots of semicolons where they are technically allowed, but I've never seen anybody use them there. And trying to make lots of subs called END, one per line of the input file, is just unfathomable. (I tried a quick test where I tried to make multiple named subs inside a loop like that, and call them both inside and outside the loop; it didn't do anything that makes sense to me.) Also, that `sub END` is never called, at least in your snippet. However, when I just removed `sub END` from before that block, so that it would just execute the block, and set `OUT = STDOUT`, I was able to better see what was going on. To help figure things out, I also added some print statements before and inside `map`'s block `push @{$r{join ' ' x 8, @F[0..3]};}, [@F[4, 6]]; { foreach my $k (keys %r) { my($x, $y); print "\$r{\$k};", $r{$k}; print "\@{\$r{\$k};}", @{$r{$k};}; map {$x += $$_[0]; $y += $$_[1]; print "\$_='$_'", "SS_[0]=$$_[0]", "SS_[1]=$$_[1]", "x=$x", "y=$y"; } @{$r{$k};}; my @g = split(/\s+/,$k); print OUT "$g[0]\t@g[1]\t@g[2]\t@g[3]\t", $x / scalar(@{$r +{$k};}), "\t$y\n"; } }` [download] From what I can tell, you've got a hash `%r`, with keys made from the joining the first four columns. Each element of that hash is an array ref; the array contained within holds array-refs to the col4,col6 pairs. Thus, the map line says: For a given key $k, get the array behind the array ref for that element (`@{ $r{$k} }`). The map says, for each element in that array (so, for each array ref that points to the col4,col6 pairs), which `map`'s block will refer to as $_, run the block. The block says to add the col4 value (which is the first element of the array referenced by $_) to $x and add the col6 value (which is the second element of the array referenced by $_) to $y. That's as confusing as mud, I'm sure. When $k refers to the second '1 136 G A' line: $k; # == "1 136 G A" with more spaces $r{$k}; # == [ [1,6], [1,9] ] == referenece to an array of arr +ay-refs @{$r{$k}}; # == ( [1,6], [1,9] ) == array of array-refs map {} @ # for each element in the @ array, run the {} block # First element of @ is the first ref to a pair-array: $_; # = [1,6] == array ref $$_; # = (1,6) == array $$_[0]; # = 1 == first element of array (1,6) or arrayref [1,6] $$_[1]; # = 6 # Second element of @ is the next ref to a pair-array $_; # = [1,9] $$_[0]; # 1 $$_[1]; # 9 [download] But there are lots of other oddities. I believe the END sub will only get the first definition, so I believe it can only ever print out the '1 111 C T' results, which isn't overly helpful. And I never see it called. And using `@g[1]` is pointless, and should be `$g[1]`, because it's a single element of an array, so you don't need it to be an array slice. As corion said, if possible, ask the original author. Otherwise, I hope these hints have helped.	[reply] [d/l] [select]
Re: Code clarification - use of map and $$_ by pryrt (Abbot) on Aug 09, 2016 at 16:28 UTC
There is much good advice and learning throughout the thread... to sum up what my recommendations would be add comments as you learn things, so when you go to support it a year down the road, and have forgotten everything you learned in this thread, you'll at least have comments to guide you. (also, add a comment to link to this thread. :-) ) Re: Code clarification - use of map and $$_ = switch from `map` to a `for` or `foreach` loop; additionally, I'd recommend using a meaningful loop variable name (I'd call it `$pair_ref` or similar, to indicate it's a reference to a pair of something) rather than the default `$_` instead of using the `$$_[0]` notation, or even the more obviously meaningful `$_->[0]`, for accessing the elements of the pair, I'd probably change to assigning the individual elements to meaningful variables: `foreach my $pair_ref ( @{ $r{$k} } ) { # for each [col4,col6] pai +r that matched on the four-column key my ($col4, $col6) = @$pair_ref; # get the (col4,col6) valu +es $x += $col4; # x is the summation of al +l the matching col4 values $y += $col6; # similar for y }` [download] Re: Code clarification - use of map and $$_ = move the END block out of the `while(<IN>)` loop Fix the print statement in the END block to not use `@g[1]`, since that's a 1-element slice, and is better written as `$g[1]`. You might want to further look into the perlvar `$,` and `$"` variables for automatically joining within your print statement (or use a manual join function if you want to make the joining explicit¹) rather than manually placing tabs between each element of the `@g` array, `$x/...`, and `$y` values. ¹ I know for many non-expert Perl coders, the explicit join is more natural and possibly easier to remember in the future than the magic variables; personally, despite having hacked Perl for ... eek, two decades now! -- it wasn't until I started really frequenting perlmonks a few months back that I actually understood what the `$,` and `$"` variables do, and started using them	[reply] [d/l] [select]
Re: Code clarification - use of map and $$_ by Anonymous Monk on Aug 09, 2016 at 14:57 UTC
The code `map {$x += $$_[0]; $y += $$_[1];} @{$r{$k};};` is better written as: `for (@{ $r{$k} }) { $x += $_->[0]; $y += $_->[1]; }` (Using `map` statement in void context in lieu of `for` is confusing.) The dereferences are used because the structure is populated with array references. In other words, you have a hash of arrays (HoA). See perldata, perldsc. Specifically, look at the matching statements: `push @{ $r{ ... } }, [ ... ]; ... for (@{ $r{ ... } }) { ... $_->[...] }` [download] The first populates the HoA with necessary data. The second one uses this data further down. Why HoA, why the need for array `[ constructors ]` and dereferences? Because one scalar was not enough. The original programmer needed to track two values and used the small arrays as tuples.	[reply] [d/l] [select]
Re: Code clarification - use of map and $$_ by perldigious (Priest) on Aug 09, 2016 at 16:48 UTC
Inherited code... Hmm, I believe I also see a bareword filehandle there in the print line. `print OUT "$g[0]\t@g[1]\t@g[2]\t@g[3]\t", $x / scalar(@{$r{$k};}), "\t$y\n";` hisses like a vampire who just suddenly had sunlight cast on him and scurries away rapidly UPDATE: With `<IN>` as well. Bareword vs. Indirect Filehandle I love it when things get difficult; after all, difficult pays the mortgage. - Dr. Keith Whites I hate it when things get difficult, so I'll just sell my house and rent cheap instead. - perldigious	[reply] [d/l] [select]