To avoid duplicate user/IP pairs, you could use something of this general form
Now you need to come up with sensible definitions for next_pair() and insert().my %seen; while ( my ( $user, $ip ) = next_pair() ) { next if $seen{ $user }{ $ip }++; insert( $user, $ip ); }
Just to be clear, the above will prevent any pairing from being inserted more than once, but it is possible users and IPs to be inserted multiple times, as long as in each insertion they are associated with different IPs and users, respectively. If you want to make sure the users are inserted only once, irrespective of IP address, then the first line in the loop above would become
Siimilarly, if you want to make sure IPs are inserted only once, that line would instead benext if $seen{ $user }++;
next if $seen{ $ip }++;
One common gotcha whenever you are trying to avoid duplicates results from not having a sufficiently clear specification of what items should be regarded as equivalent. For example, how should your program deal with the pairs (john doe|12.345.678.901) and (John Doe|12.345.678.901). The code above, as written, would result in two insertions, but maybe you want to avoid any case distinctions in the name (and thus avoid the second insertion). If so, you'd need to change the first line in the loop to something like:
This ensures that your duplicate control scheme detects user names case-insensitively.next if $seen{ uc $user }{ $ip }++;
This small example illustrates the need to specify exactly what one means by "duplicates", and from this specification, design a normalization procedure that must be applied before testing for repeats. In the example above, this normalization procedure is very simple: just convert everything to uppercase. (An entirely equivalent procedure would be to convert everything to lowercase.) But you may require more elaborate normalization requirements; e.g. you may want to treat the pairs (Edward Estlin Cummings|12.345.678.901) and (e.e.cummings|12.345.678.901) as equivalent.
the lowliest monk
In reply to Re: Using hashes or arrays to remove duplicate entries
by tlm
in thread Using hashes or arrays to remove duplicate entries
by ghettofinger
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |