in reply to Unique Values within AOH

Instead of de-duping after the fact you could make sure your lists are unique as you build them. Simpler and shorter.

use strict; use Data::Dumper; my $team; my $player; my %teamAccts; my %seen; while (<DATA>) { if (/^T:(\S+)/) { $team = $1; next; } if (/^P:(\S+)/) { $player = $1; push @{$teamAccts{$team}}, $player unless $seen{"$team$player"}++; } } print Dumper(\%teamAccts); __DATA__ T:REDS P:GRIFFEY P:GRIFFEY P:PEREZ P:ROSE P:BENCH T:PHILLIES P:ROSE P:ROSE T:MARINERS P:GRIFFEY P:PEREZ

Replies are listed 'Best First'.
Re^2: Unique Values within AOH (updated)
by AnomalousMonk (Archbishop) on Oct 30, 2019 at 18:24 UTC
    unless $seen{"$team$player"}++;

    One small nit. Because the  %seen uniqification hash is common to all teams, it's possible to confuse certain team/player records, e.g.:
        Team    Player
        ABC     DEFGH
        ABCD    EFGH
    which both become the key 'ABCDEFGH'.

    This is easily avoided by joining the two strings with some character or character sequence that (you hope!) cannot possibly occur in team or player names:
        unless $seen{"$team\x00$player"}++;
    (Update: Actually, for the given order of concatenation, it's only necessary that the separator character or character sequence cannot appear in the team name.)

    This nit is very unlikely to bite, but may be very difficult to debug (or even see in large data sets) if it does.

    Update: Another, possibly more significant nit. All team and player name data in the OPed example is uppercase. If there may be any mixing of case, then, e.g., 'Rose' will be distinct from 'ROSE' and de-duplication may fail. In this case, or even as a general precaution, team/player names can be common-cased:
        unless $seen{"\U$team\x00$player"}++;
    See Quote and Quote-like Operators in perlop for  \U \L et al.


    Give a man a fish:  <%-{-{-{-<

      This is easily avoided by joining the two strings with some character or character sequence that (you hope!) cannot possibly occur in team or player names

      Or, take the guesswork out of it and add an extra layer of depth to the hash:

      unless $seen{$team}{$player}++;
      Another, possibly more significant nit. All team and player name data in the OPed example is uppercase. If there may be any mixing of case ...

      True, but to be fair to the OP it was stated to be just an example. I'm not convinced that dirtdog is actually applying this to teams and players. If he were then he would have bigger problems with real-world data such as the current All Blacks XV which has featured all three Barretts in recent weeks. You can't go de-duplicating three different players who share the same surname.

        ... add an extra layer of depth to the hash ...

        The best solution, I think.

        ... the [data] was stated to be just an example.

        True, but one can only address the circumstances before one. As you say, the whole consideration may turn out to be irrelevant.

        ... three different players who share the same surname.
        OT: I recently read (it might even have been here on PM) of a sports team somewhere with two players with the same surname and same given name who were playing in the same game, and one guy replaced the other! De-duplicate that!


        Give a man a fish:  <%-{-{-{-<

Re^2: Unique Values within AOH
by dirtdog (Monk) on Oct 30, 2019 at 14:54 UTC

    thanks hippo!...works perfectly.