Re^5: Is there better way to delete dups in an AoH?

Here is another way, similar to hv's solution, but generalized so that you don't have to explicitly name the keys. It compares the hashes for equality - ie they must have the same set of keys, and those keys must have the same values. It does this by making a string representation of the hash. Its not so general that it would handle more complex data structures.

#!/usr/bin/perl

use Data::Dumper;

my $AoH = [ 
  { page => 'spring', chap => 'spring'},
  { page => 'spring', chap => 'spring'},
  { page => 'winter', chap => 'winter'},
  { page => 'spring', chap => 'spring'},
  { page => 'spring', chap => 'fall'},
];

my %dupes;
++$dupes{ stringify( %$_ ) } for @$AoH;;

print Dumper \%dupes;

@$AoH = grep $dupes{ stringify( %$_ ) } == 1, @$AoH;

print Dumper $AoH;

sub stringify {
  my %hash = @_;

  # explicitly name keys if you want to compare on some
  #return join '', map { $_, $hash{$_} } ( qw/ page chap / );

  # or use them all
  join '', map { $_, $hash{$_} } sort keys %hash;
}

__END__
$VAR1 = {
          'chapspringpagespring' => 3,
          'chapfallpagespring' => 1,
          'chapwinterpagewinter' => 1
        };
$VAR1 = [
          {
            'page' => 'winter',
            'chap' => 'winter'
          },
          {
            'page' => 'spring',
            'chap' => 'fall'
          }
        ];
[download]

Comment on Re^5: Is there better way to delete dups in an AoH? Download Code

Replies are listed 'Best First'.
Re^6: Is there better way to delete dups in an AoH? by hv (Prior) on Jun 07, 2004 at 01:31 UTC
Be careful when comparing things for equality by stringifying: except when the values stringified conform to some rules to remove the danger (eg if each value has a fixed length), there is always the danger of false positives. For example, given: `join '', map { $_, $hash{$_} } ( qw/ page chap / );` [download] you cannot tell the difference between these two: `{ page => 'chappage', chap => '' }, { page => '', chap => 'pagechap' },` [download] There are two ways to avoid this problem: either you take advantage of some regularity of the data, or you get more complex. If the data has some regularity - if, say, it is not allowed to include a nul byte - you can use the unpermitted sequence as your delimiter: `join "\0", map { $_, $hash{$_} } ( qw/ page chap / );` [download] If not - if the data can contain any arbitrary string - then you cannot avoid modifying the data at least enough to disambiguate the delimiter. So if you want to use 'x' as the delimiter, you need to encode 'x' in the data as something else. You can't encode it as 'xx', because that would be ambiguous (compared to an empty data element). So you might encode 'x' as 'xx', and use 'xyx' as the delimiter: now finally you've done enough to avoid ambiguity, and therefore to avoid false matches: `join 'xyx', map { s/x/xx/g; $_ } map { $_, $hash{$_} }, qw/ page cha +p /;` [download] One final point: once you have solved the ambiguity problem, note that there is no point including invariant information in the stringified version. So when the keys are explicitly named, you might as well just encode the values: `join 'xyx', map { s/x/xx/g; $_ } map $hash{$_}, qw/ page chap /;` [download] Hugo	[reply] [d/l] [select]
Re^7: Is there better way to delete dups in an AoH? by qq (Hermit) on Jun 08, 2004 at 00:18 UTC
++hv. Every day perlmonks make me a better programmer. - qq	[reply]