in reply to Re^2: Is there better way to delete dups in an AoH?
in thread Is there better way to delete dups in an AoH?

1) why/how does it skip the dupe?

It actually doesn't skip it. It's just that it is assigned as a key to a hash, and hash keys are unique, thus it overwrites the old key without doing harm.

2) what does assigning the 1 do?

It's just a random value, you could assign anything you like as a value to the hash. It's not important, as long as there's something there to make perl happy.

3) what does that 2nd line do to delete the key/value pairs of page => 1?

nothing. It now restores the AoH. Hmm, I didn't tidy up the "%uniq" hash if you thought of that.

Also, any good sources for learning how to disect a map/grep

Besides   perldoc -f grep   and perldoc -f map? Hmm. Reading Perlmonks... ;)

Cheers, Sören

  • Comment on Re^3: Is there better way to delete dups in an AoH?

Replies are listed 'Best First'.
Re^4: Is there better way to delete dups in an AoH?
by bradcathey (Prior) on Jun 06, 2004 at 23:14 UTC
    Thanks again for the excellent explanation. However, my script just got a bit more complicated. How do I get rid of two dupe keys per element? See what I naively tried:
    use Data::Dumper; use strict; my $AoH = [ { page => 'spring', chap => 'spring'}, { page => 'winter', chap => 'winter'}, { page => 'fall', chap => 'fall'}, { page => 'summer', chap => 'summer'}, { page => 'spring', chap => 'spring'} ]; my %uniq = map { $_->{page} => ; $_->{chap} => 1 } @$AoH; @$AoH = map { { page => $_ };{ chap=>$_ } } keys %uniq; print Dumper ($AoH);
    I doesn't work exactly the way it should and returns:
    $VAR1 = [ { 'chap' => 'summer' }, { 'chap' => 'winter' }, { 'chap' => 'fall' }, { 'chap' => 'spring' } ];

    —Brad
    "A little yeast leavens the whole dough."
      I think you want to have your last map return one 2-element hash ref for each unique key. Instead, you're creating two 1-element hash refs, and returning only the second one.
      my %uniq = map { ($_->{page} => 1, $_->{chap} => 1) } @$AoH; @$AoH = map { { page => $_ , chap=>$_ } } keys %uniq;

      The PerlMonk tr/// Advocate
        Thanks Roy_Johnson! Before I posted I tried it exactly as you had it, but without the parens to make it an element. Just didn't see it. I love this place.

        —Brad
        "A little yeast leavens the whole dough."
        Roy Johnson, I still hope you're watching this thread. Your solution works great if the values of both keys in the array element are the same:
        page=>'fall', chap=>'fall'
        BUT, what if they are different:
        my $AoH = [ { page => 'main', chap => 'About'}, { page => 'main', chap => 'Contact'}, { page => 'main', chap => 'About'}, { page => 'sub', chap => 'About'}, { page => 'sub', chap => 'Contact'} ]; my %uniq = map { ($_->{page} => 1, $_->{chap} => 1) } @$AoH; @$AoH = map { { page => $_ , chap=>$_ } } keys %uniq; print Dumper ($AoH);
        Which prints:
        $VAR1 = [ { 'chap' => 'Contact', 'page' => 'Contact' }, { 'chap' => 'About', 'page' => 'About' }, { 'chap' => 'sub', 'page' => 'sub' }, { 'chap' => 'main', 'page' => 'main' } ];
        I want to end up with:
        $VAR1 = [ { 'page' => 'main', 'chap' => 'About' }, { 'page' => 'main', 'chap' => 'Contact' }, { 'page' => 'sub', 'chap' => 'About' }, { 'page' => 'sub', 'chap' => 'Contact' } ];
        eliminating that dupe element of:
        'page' => 'main', 'chap' => 'About'
        I've stared at it for an hour and am bewildered. Ideas? Thanks.

        —Brad
        "A little yeast leavens the whole dough."

      Here is another way, similar to hv's solution, but generalized so that you don't have to explicitly name the keys. It compares the hashes for equality - ie they must have the same set of keys, and those keys must have the same values. It does this by making a string representation of the hash. Its not so general that it would handle more complex data structures.

      #!/usr/bin/perl use Data::Dumper; my $AoH = [ { page => 'spring', chap => 'spring'}, { page => 'spring', chap => 'spring'}, { page => 'winter', chap => 'winter'}, { page => 'spring', chap => 'spring'}, { page => 'spring', chap => 'fall'}, ]; my %dupes; ++$dupes{ stringify( %$_ ) } for @$AoH;; print Dumper \%dupes; @$AoH = grep $dupes{ stringify( %$_ ) } == 1, @$AoH; print Dumper $AoH; sub stringify { my %hash = @_; # explicitly name keys if you want to compare on some #return join '', map { $_, $hash{$_} } ( qw/ page chap / ); # or use them all join '', map { $_, $hash{$_} } sort keys %hash; } __END__ $VAR1 = { 'chapspringpagespring' => 3, 'chapfallpagespring' => 1, 'chapwinterpagewinter' => 1 }; $VAR1 = [ { 'page' => 'winter', 'chap' => 'winter' }, { 'page' => 'spring', 'chap' => 'fall' } ];

        Be careful when comparing things for equality by stringifying: except when the values stringified conform to some rules to remove the danger (eg if each value has a fixed length), there is always the danger of false positives. For example, given:

        join '', map { $_, $hash{$_} } ( qw/ page chap / );
        you cannot tell the difference between these two:
        { page => 'chappage', chap => '' }, { page => '', chap => 'pagechap' },

        There are two ways to avoid this problem: either you take advantage of some regularity of the data, or you get more complex. If the data has some regularity - if, say, it is not allowed to include a nul byte - you can use the unpermitted sequence as your delimiter:

        join "\0", map { $_, $hash{$_} } ( qw/ page chap / );

        If not - if the data can contain any arbitrary string - then you cannot avoid modifying the data at least enough to disambiguate the delimiter. So if you want to use 'x' as the delimiter, you need to encode 'x' in the data as something else. You can't encode it as 'xx', because that would be ambiguous (compared to an empty data element). So you might encode 'x' as 'xx', and use 'xyx' as the delimiter: now finally you've done enough to avoid ambiguity, and therefore to avoid false matches:

        join 'xyx', map { s/x/xx/g; $_ } map { $_, $hash{$_} }, qw/ page cha +p /;

        One final point: once you have solved the ambiguity problem, note that there is no point including invariant information in the stringified version. So when the keys are explicitly named, you might as well just encode the values:

        join 'xyx', map { s/x/xx/g; $_ } map $hash{$_}, qw/ page chap /;

        Hugo