Re: remove duplicates

If your ~~keys~~values look duplicate, most likely, there are things you don't see. Most likely it's whitespace at the end of one ~~key~~value. Inspect your hash through Data::Dumper, and best concentrate on ~~keys~~value that should be equal but aren't:

use strict;
use Data::Dumper;

my %hash = (
  'Hello ' => 'world',
  'The'  => 'world ',
);

print Dumper \%hash;
[download]

If your hash is too large to conveniently dump, you can find out one of the values that are duplicate and create a copy of the hash:

for (keys %bad_hash) {
    if ($bad_hash{$_} =~ /orl/) { # because we're looking for "hello"
        $hash{ $_ } = $bad_hash{ $_ }
    }
}

print Dumper \%bad_hash;
[download]

You also might run into encoding issues where two different octet sequences (that actually compare as unequal) encode to the same glyph sequence. But as Perl 5.8 internally uses UTF-8, that shouldn't be a problem. In any case, it would help to see some more yet still small code and some really small dataset (2 lines) that reproduces the problem.

Update: Realized this is about values, not keys.

Comment on Re: remove duplicates Select or Download Code

Replies are listed 'Best First'.
Re^2: remove duplicates by Anonymous Monk on Mar 11, 2006 at 23:12 UTC
Hi again, I tried to clean up the data as suggested, but isnt there some other way of making sure a key doesnt have duplicate values. (especially if the logs you r reading are > 25k) Please help	[reply]
Re^3: remove duplicates by Corion (Patriarch) on Mar 12, 2006 at 15:52 UTC
I'm not sure I understand where your problem lies. A hash is the traditional way in Perl to check for duplicates. If your memory gets too small because you have too many different entries, you can use DB_File or any other tied hash that stores its data on disk instead of memory. If all that fails, you can always use a database.	[reply]