dirtdog has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I'm struggling with getting a unique list printed out within an Array of Hashes. I have a small example that shows baseball players that played for a team. I'm trying to get each player to show only once per team as follows:

$VAR1 = { 'MARINERS' => [ 'GRIFFEY', 'PEREZ' ], 'REDS' => [ 'GRIFFEY', 'PEREZ', 'ROSE', 'BENCH' ], 'PHILLIES' => [ ROSE' ] };
However, what I end up getting is:
$VAR1 = { 'MARINERS' => [ 'GRIFFEY', 'PEREZ' ], 'REDS' => [ 'GRIFFEY', 'GRIFFEY', 'PEREZ', 'ROSE', 'BENCH' ], 'PHILLIES' => [ 'ROSE', 'ROSE' ] };

The sample code I have is:

use strict; use constant DEBUG => 2; my $team; my $player; my %teamAccts; my %teamAcctsUniq; my $teamAccts; my $teamAcctsUniq; while (<DATA>) { chomp; if ( (/^T:/) ){ $team = (split /:/) [1]; } if ( (/^P:/) ) { $player = (split /:/) [1]; push( @{$teamAccts{$team}}, $player); } } my %seen; @$teamAcctsUniq = grep { ! $seen{$_->{player}}++ } @$teamAccts; #print Dumper(\%teamAccts); print Dumper(\%teamAcctsUniq); __DATA__ T:REDS P:GRIFFEY P:GRIFFEY P:PEREZ P:ROSE P:BENCH T:PHILLIES P:ROSE P:ROSE T:MARINERS P:GRIFFEY P:PEREZ

any help you could provide would be greatly appreciated. thank you.

2019-10-31 Athanasius added code tags around the data.

Replies are listed 'Best First'.
Re: Unique Values within AOH
by hippo (Archbishop) on Oct 30, 2019 at 14:37 UTC

    Instead of de-duping after the fact you could make sure your lists are unique as you build them. Simpler and shorter.

    use strict; use Data::Dumper; my $team; my $player; my %teamAccts; my %seen; while (<DATA>) { if (/^T:(\S+)/) { $team = $1; next; } if (/^P:(\S+)/) { $player = $1; push @{$teamAccts{$team}}, $player unless $seen{"$team$player"}++; } } print Dumper(\%teamAccts); __DATA__ T:REDS P:GRIFFEY P:GRIFFEY P:PEREZ P:ROSE P:BENCH T:PHILLIES P:ROSE P:ROSE T:MARINERS P:GRIFFEY P:PEREZ
      unless $seen{"$team$player"}++;

      One small nit. Because the  %seen uniqification hash is common to all teams, it's possible to confuse certain team/player records, e.g.:
          Team    Player
          ABC     DEFGH
          ABCD    EFGH
      which both become the key 'ABCDEFGH'.

      This is easily avoided by joining the two strings with some character or character sequence that (you hope!) cannot possibly occur in team or player names:
          unless $seen{"$team\x00$player"}++;
      (Update: Actually, for the given order of concatenation, it's only necessary that the separator character or character sequence cannot appear in the team name.)

      This nit is very unlikely to bite, but may be very difficult to debug (or even see in large data sets) if it does.

      Update: Another, possibly more significant nit. All team and player name data in the OPed example is uppercase. If there may be any mixing of case, then, e.g., 'Rose' will be distinct from 'ROSE' and de-duplication may fail. In this case, or even as a general precaution, team/player names can be common-cased:
          unless $seen{"\U$team\x00$player"}++;
      See Quote and Quote-like Operators in perlop for  \U \L et al.


      Give a man a fish:  <%-{-{-{-<

        This is easily avoided by joining the two strings with some character or character sequence that (you hope!) cannot possibly occur in team or player names

        Or, take the guesswork out of it and add an extra layer of depth to the hash:

        unless $seen{$team}{$player}++;
        Another, possibly more significant nit. All team and player name data in the OPed example is uppercase. If there may be any mixing of case ...

        True, but to be fair to the OP it was stated to be just an example. I'm not convinced that dirtdog is actually applying this to teams and players. If he were then he would have bigger problems with real-world data such as the current All Blacks XV which has featured all three Barretts in recent weeks. You can't go de-duplicating three different players who share the same surname.

      thanks hippo!...works perfectly.

Re: Unique Values within AOH
by NetWallah (Canon) on Oct 30, 2019 at 16:46 UTC
    Obligatory one-liner:
    perl -F: -lane "$F[0] eq q|T| and $team=$F[1]; $F[0] eq q|P| and $h{$ +team}{$F[1]}++}{for my $t(sort keys %h){print $t, qq|\t\[|, join( qq| +, |, sort keys %{$h{$t}}),qq|\]|}" Your-Data.txt MARINERS [GRIFFEY, PEREZ] PHILLIES [ROSE] REDS [BENCH, GRIFFEY, PEREZ, ROSE]
    Substitute single quotes for double, if running on Linux.

                    "From there to here, from here to there, funny things are everywhere." -- Dr. Seuss

Re: Unique Values within AOH
by johngg (Canon) on Oct 30, 2019 at 15:50 UTC

    Rather than reading the data line by line you could slurp all of it into a single scalar string then split into per-team chunks at each point in the string that is followed by a "T" at the start of a line. Then for each team get rid of the [PT]: characters globally and split again on newlines into the team line and player lines. This gets rid of the need to test each line for a "T" looking for a new team. Finally, as hippo suggests, de-dup before building your HoA;

    use 5.026; use warnings; use Data::Dumper; open my $dataFH, q{<}, \ <<__EOD__ or die $!; T:REDS P:GRIFFEY P:GRIFFEY P:PEREZ P:GRIFFEY P:PEREZ P:ROSE P:BENCH T:PHILLIES P:ROSE P:ROSE T:MARINERS P:GRIFFEY P:PEREZ __EOD__ my $data = do { local $/; <$dataFH>; }; close $dataFH or die $!; my @teams = split m{(?=^T)}m, $data; my %teamAccts; foreach my $teamData ( @teams ) { $teamData =~ s{[PT]:}{}g; my( $teamLine, @playerLines ) = split m{\n}, $teamData; $teamAccts{ $teamLine } = [ do { my %seen; grep { ! $seen{ $_ } ++ } @playerLines; } ]; } print Data::Dumper ->new( [ \ %teamAccts ], [ qw{ *teamAccts } ] ) ->Sortkeys( 1 ) ->Dumpxs();

    The output.

    %teamAccts = ( 'MARINERS' => [ 'GRIFFEY', 'PEREZ' ], 'PHILLIES' => [ 'ROSE' ], 'REDS' => [ 'GRIFFEY', 'PEREZ', 'ROSE', 'BENCH' ] );

    I hope this is of interest.

    Update: Clarified wording slightly.

    Cheers,

    JohnGG

      Rather than reading the data line by line you could slurp all of it into a single scalar string then split into per-team chunks ...

      OK for small files, begging for trouble as soon as files grow beyond the amount of free RAM.

      On a 32-bit perl, you simply can not have a scalar larger than 4 GBytes, because you have no more address lines. The real limit may be much less, depending on operating system and other factors. So you are limited to files smaller than that. Reading line by line allows processing Petabytes of data without running out of memory.

      Even a 64-bit perl will be limited to the amount of free RAM and free swap space. Once all RAM and swap is used up and the machine has come to a grinding halt, you are lost. Again, reading line by line allows processing much more data.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
        OK for small files, begging for trouble as soon as files grow beyond the amount of free RAM.

        Granted, but for this particular problem I doubt that there are enough teams out there for the data to run up against the physical RAM limit, even on quite elderly systems.

        Cheers,

        JohnGG

Re: Unique Values within AOH
by choroba (Cardinal) on Oct 30, 2019 at 18:22 UTC
    Use uniq from List::Util:
    use List::Util qw{ uniq }; %teamAcctsUniq = map { $_ => [ uniq(@{ $teamAccts{$_} }) ] } keys %teamAccts;

    Or you can assign the whole hash at once:

    @teamAcctsUniq{ keys %teamAccts } = map [uniq(@$_)], values %teamAccts +;
    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re: Unique Values within AOH
by GrandFather (Saint) on Oct 31, 2019 at 02:18 UTC

    Isn't this just a case of the wrong data structure? As shown it is a hash of arrays. If it were a Hash of Hashes the problem goes away.

    A better question might be, why a HoH or HoA at all. Shouldn't data like this be in a database?

    Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond