in reply to remove duplicates

If your keysvalues look duplicate, most likely, there are things you don't see. Most likely it's whitespace at the end of one keyvalue. Inspect your hash through Data::Dumper, and best concentrate on keysvalue that should be equal but aren't:

use strict; use Data::Dumper; my %hash = ( 'Hello ' => 'world', 'The' => 'world ', ); print Dumper \%hash;

If your hash is too large to conveniently dump, you can find out one of the values that are duplicate and create a copy of the hash:

for (keys %bad_hash) { if ($bad_hash{$_} =~ /orl/) { # because we're looking for "hello" $hash{ $_ } = $bad_hash{ $_ } } } print Dumper \%bad_hash;

You also might run into encoding issues where two different octet sequences (that actually compare as unequal) encode to the same glyph sequence. But as Perl 5.8 internally uses UTF-8, that shouldn't be a problem. In any case, it would help to see some more yet still small code and some really small dataset (2 lines) that reproduces the problem.

Update: Realized this is about values, not keys.

Replies are listed 'Best First'.
Re^2: remove duplicates
by Anonymous Monk on Mar 11, 2006 at 23:12 UTC
    Hi again, I tried to clean up the data as suggested, but isnt there some other way of making sure a key doesnt have duplicate values. (especially if the logs you r reading are > 25k) Please help

      I'm not sure I understand where your problem lies. A hash is the traditional way in Perl to check for duplicates. If your memory gets too small because you have too many different entries, you can use DB_File or any other tied hash that stores its data on disk instead of memory.

      If all that fails, you can always use a database.