Re: (almost) Unique elements of an array

I'm somewhat confused by wanting to identify unique words regardless of case, while still needing the case information. Perhaps that says you need two hashes - one that uses uc'd¹ case, and the other using words as-is.

¹ I seem to recall that upper-casing works in more encodings/languages than lower-casing, and this has nothing to do with perl. There just aren't lower-case equivalents of all upper-case characters in all languages. There are some characters which are defined to be "upper-case" even though there is no "lower-case" equivalent. If you're just dealing with English letters, then this distinction is irrelevant to you.

Comment on Re: (almost) Unique elements of an array

Replies are listed 'Best First'.
Re^2: (almost) Unique elements of an array by Roy Johnson (Monsignor) on Jul 14, 2005 at 14:17 UTC
Anonymous seems to be saying that he doesn't want to have to modify the array itself. I think having one hash:`my %uniq = map {(uc($_) => undef)} @ar;` is what is called for. Caution: Contents may have been coded under pressure.	[reply] [d/l]
Re^3: (almost) Unique elements of an array by Transient (Hermit) on Jul 14, 2005 at 14:23 UTC
But then you'd lose the original case information. You'd have to have a hash of arrays to preserve the different case variations. Update: Here's a way to do it with a hash of hashes: #!/usr/bin/perl use warnings; use strict; use Data::Dumper; my @original_array = qw( THIS This That That ThAT ThAt ThAt THIS These + Those ); my %results; map { $results{uc($_)}->{$_} = undef } @original_array; print "Original\n", Dumper(@original_array), "\n"; print "Result\n", Dumper( %results ), "\n"; __END__ Original $VAR1 = 'THIS'; $VAR2 = 'This'; $VAR3 = 'That'; $VAR4 = 'That'; $VAR5 = 'ThAT'; $VAR6 = 'ThAt'; $VAR7 = 'ThAt'; $VAR8 = 'THIS'; $VAR9 = 'These'; $VAR10 = 'Those'; Result $VAR1 = 'THAT'; $VAR2 = { 'ThAT' => undef, 'ThAt' => undef, 'That' => undef }; $VAR3 = 'THIS'; $VAR4 = { 'THIS' => undef, 'This' => undef }; $VAR5 = 'THOSE'; $VAR6 = { 'Those' => undef }; $VAR7 = 'THESE'; $VAR8 = { 'These' => undef }; [download]	[reply] [d/l]
Re^4: (almost) Unique elements of an array by Roy Johnson (Monsignor) on Jul 14, 2005 at 15:12 UTC
But then you'd lose the original case information. No, it's still in the array. `uc` does not modify its argument, it returns a modified copy. Caution: Contents may have been coded under pressure.	[reply] [d/l]
Re^3: (almost) Unique elements of an array by Anonymous Monk on Jul 14, 2005 at 14:43 UTC
That's brilliant - excellent to know about the fact it's better to go to uppercase to do this too - thanks. It leaves me wondering if there is any reason to use a hash to store a big list of words over an array and is it possible or sensible to turn %uniq into @uniq.	[reply]
Re^4: (almost) Unique elements of an array by tlm (Prior) on Jul 14, 2005 at 15:03 UTC
The hash is just a trick to enforce uniqueness, since, by definition, hashkeys are unique (any attempt to "duplicate" a hashkey at most changes the value that the key is associated with, but no duplicate is created). If you want the array you can do `my @uniq = keys %{ +{ map +( uc() => undef ), @ar } };` [download] the lowliest monk	[reply] [d/l]