Re^3: Is there better way to delete dups in an AoH?

Replies are listed 'Best First'.
Re^4: Is there better way to delete dups in an AoH? by bradcathey (Prior) on Jun 06, 2004 at 23:14 UTC
Thanks again for the excellent explanation. However, my script just got a bit more complicated. How do I get rid of two dupe keys per element? See what I naively tried: `use Data::Dumper; use strict; my $AoH = [ { page => 'spring', chap => 'spring'}, { page => 'winter', chap => 'winter'}, { page => 'fall', chap => 'fall'}, { page => 'summer', chap => 'summer'}, { page => 'spring', chap => 'spring'} ]; my %uniq = map { $_->{page} => ; $_->{chap} => 1 } @$AoH; @$AoH = map { { page => $_ };{ chap=>$_ } } keys %uniq; print Dumper ($AoH);` [download] I doesn't work exactly the way it should and returns: `$VAR1 = [ { 'chap' => 'summer' }, { 'chap' => 'winter' }, { 'chap' => 'fall' }, { 'chap' => 'spring' } ];` [download] —Brad "A little yeast leavens the whole dough."	[reply] [d/l] [select]
Re^5: Is there better way to delete dups in an AoH? by Roy Johnson (Monsignor) on Jun 06, 2004 at 23:30 UTC
I think you want to have your last map return one 2-element hash ref for each unique key. Instead, you're creating two 1-element hash refs, and returning only the second one. `my %uniq = map { ($_->{page} => 1, $_->{chap} => 1) } @$AoH; @$AoH = map { { page => $_ , chap=>$_ } } keys %uniq;` [download] The PerlMonk `tr///` Advocate	[reply] [d/l]
Re^6: Is there better way to delete dups in an AoH? by bradcathey (Prior) on Jun 06, 2004 at 23:53 UTC
Thanks Roy_Johnson! Before I posted I tried it exactly as you had it, but without the parens to make it an element. Just didn't see it. I love this place. —Brad "A little yeast leavens the whole dough."	[reply]
Re^6: Is there better way to delete dups in an AoH? by bradcathey (Prior) on Jun 08, 2004 at 02:37 UTC
Roy Johnson, I still hope you're watching this thread. Your solution works great if the values of both keys in the array element are the same: `page=>'fall', chap=>'fall'` BUT, what if they are different: `my $AoH = [ { page => 'main', chap => 'About'}, { page => 'main', chap => 'Contact'}, { page => 'main', chap => 'About'}, { page => 'sub', chap => 'About'}, { page => 'sub', chap => 'Contact'} ]; my %uniq = map { ($_->{page} => 1, $_->{chap} => 1) } @$AoH; @$AoH = map { { page => $_ , chap=>$_ } } keys %uniq; print Dumper ($AoH);` [download] Which prints: `$VAR1 = [ { 'chap' => 'Contact', 'page' => 'Contact' }, { 'chap' => 'About', 'page' => 'About' }, { 'chap' => 'sub', 'page' => 'sub' }, { 'chap' => 'main', 'page' => 'main' } ];` [download] I want to end up with: `$VAR1 = [ { 'page' => 'main', 'chap' => 'About' }, { 'page' => 'main', 'chap' => 'Contact' }, { 'page' => 'sub', 'chap' => 'About' }, { 'page' => 'sub', 'chap' => 'Contact' } ];` [download] eliminating that dupe element of: `'page' => 'main', 'chap' => 'About'` [download] I've stared at it for an hour and am bewildered. Ideas? Thanks. —Brad "A little yeast leavens the whole dough."	[reply] [d/l] [select]
Re^7: Is there better way to delete dups in an AoH? by Roy Johnson (Monsignor) on Jun 08, 2004 at 03:32 UTC
Re^8: Is there better way to delete dups in an AoH? by bradcathey (Prior) on Jun 08, 2004 at 11:03 UTC
Re^5: Is there better way to delete dups in an AoH? by qq (Hermit) on Jun 07, 2004 at 00:01 UTC
Here is another way, similar to hv's solution, but generalized so that you don't have to explicitly name the keys. It compares the hashes for equality - ie they must have the same set of keys, and those keys must have the same values. It does this by making a string representation of the hash. Its not so general that it would handle more complex data structures. #!/usr/bin/perl use Data::Dumper; my $AoH = [ { page => 'spring', chap => 'spring'}, { page => 'spring', chap => 'spring'}, { page => 'winter', chap => 'winter'}, { page => 'spring', chap => 'spring'}, { page => 'spring', chap => 'fall'}, ]; my %dupes; ++$dupes{ stringify( %$_ ) } for @$AoH;; print Dumper \%dupes; @$AoH = grep $dupes{ stringify( %$_ ) } == 1, @$AoH; print Dumper $AoH; sub stringify { my %hash = @_; # explicitly name keys if you want to compare on some #return join '', map { $_, $hash{$_} } ( qw/ page chap / ); # or use them all join '', map { $_, $hash{$_} } sort keys %hash; } __END__ $VAR1 = { 'chapspringpagespring' => 3, 'chapfallpagespring' => 1, 'chapwinterpagewinter' => 1 }; $VAR1 = [ { 'page' => 'winter', 'chap' => 'winter' }, { 'page' => 'spring', 'chap' => 'fall' } ]; [download]	[reply] [d/l]
Re^6: Is there better way to delete dups in an AoH? by hv (Prior) on Jun 07, 2004 at 01:31 UTC
Be careful when comparing things for equality by stringifying: except when the values stringified conform to some rules to remove the danger (eg if each value has a fixed length), there is always the danger of false positives. For example, given: `join '', map { $_, $hash{$_} } ( qw/ page chap / );` [download] you cannot tell the difference between these two: `{ page => 'chappage', chap => '' }, { page => '', chap => 'pagechap' },` [download] There are two ways to avoid this problem: either you take advantage of some regularity of the data, or you get more complex. If the data has some regularity - if, say, it is not allowed to include a nul byte - you can use the unpermitted sequence as your delimiter: `join "\0", map { $_, $hash{$_} } ( qw/ page chap / );` [download] If not - if the data can contain any arbitrary string - then you cannot avoid modifying the data at least enough to disambiguate the delimiter. So if you want to use 'x' as the delimiter, you need to encode 'x' in the data as something else. You can't encode it as 'xx', because that would be ambiguous (compared to an empty data element). So you might encode 'x' as 'xx', and use 'xyx' as the delimiter: now finally you've done enough to avoid ambiguity, and therefore to avoid false matches: `join 'xyx', map { s/x/xx/g; $_ } map { $_, $hash{$_} }, qw/ page cha +p /;` [download] One final point: once you have solved the ambiguity problem, note that there is no point including invariant information in the stringified version. So when the keys are explicitly named, you might as well just encode the values: `join 'xyx', map { s/x/xx/g; $_ } map $hash{$_}, qw/ page chap /;` [download] Hugo	[reply] [d/l] [select]
Re^7: Is there better way to delete dups in an AoH? by qq (Hermit) on Jun 08, 2004 at 00:18 UTC