in reply to (almost) Unique elements of an array

I'm somewhat confused by wanting to identify unique words regardless of case, while still needing the case information. Perhaps that says you need two hashes - one that uses uc'd1 case, and the other using words as-is.

1 I seem to recall that upper-casing works in more encodings/languages than lower-casing, and this has nothing to do with perl. There just aren't lower-case equivalents of all upper-case characters in all languages. There are some characters which are defined to be "upper-case" even though there is no "lower-case" equivalent. If you're just dealing with English letters, then this distinction is irrelevant to you.

  • Comment on Re: (almost) Unique elements of an array

Replies are listed 'Best First'.
Re^2: (almost) Unique elements of an array
by Roy Johnson (Monsignor) on Jul 14, 2005 at 14:17 UTC
    Anonymous seems to be saying that he doesn't want to have to modify the array itself. I think having one hash:my %uniq = map {(uc($_) => undef)} @ar; is what is called for.

    Caution: Contents may have been coded under pressure.
      But then you'd lose the original case information. You'd have to have a hash of arrays to preserve the different case variations.

      Update:
      Here's a way to do it with a hash of hashes:
      #!/usr/bin/perl use warnings; use strict; use Data::Dumper; my @original_array = qw( THIS This That That ThAT ThAt ThAt THIS These + Those ); my %results; map { $results{uc($_)}->{$_} = undef } @original_array; print "Original\n", Dumper(@original_array), "\n"; print "Result\n", Dumper( %results ), "\n"; __END__ Original $VAR1 = 'THIS'; $VAR2 = 'This'; $VAR3 = 'That'; $VAR4 = 'That'; $VAR5 = 'ThAT'; $VAR6 = 'ThAt'; $VAR7 = 'ThAt'; $VAR8 = 'THIS'; $VAR9 = 'These'; $VAR10 = 'Those'; Result $VAR1 = 'THAT'; $VAR2 = { 'ThAT' => undef, 'ThAt' => undef, 'That' => undef }; $VAR3 = 'THIS'; $VAR4 = { 'THIS' => undef, 'This' => undef }; $VAR5 = 'THOSE'; $VAR6 = { 'Those' => undef }; $VAR7 = 'THESE'; $VAR8 = { 'These' => undef };
        But then you'd lose the original case information.
        No, it's still in the array. uc does not modify its argument, it returns a modified copy.

        Caution: Contents may have been coded under pressure.
      That's brilliant - excellent to know about the fact it's better to go to uppercase to do this too - thanks.

      It leaves me wondering if there is any reason to use a hash to store a big list of words over an array and is it possible or sensible to turn %uniq into @uniq.

        The hash is just a trick to enforce uniqueness, since, by definition, hashkeys are unique (any attempt to "duplicate" a hashkey at most changes the value that the key is associated with, but no duplicate is created).

        If you want the array you can do

        my @uniq = keys %{ +{ map +( uc() => undef ), @ar } };

        the lowliest monk