Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I've looked through the large number of solutions to the finding unique elements in arrays efficiently problem eg. Find unique elements in an array

However I've none of these appear applicible to my problem: I have large arrays which contain "words" and I'd like to identify the unique words regardless of case.

Is there a simple way of doing that - all the methods for finding unique elements in arrays treat This as different from THIS.

Maybe there is a way of the contents of an array of words to lowercase in an efficient manner?

Any suggestions appriciated - ensuring lowercase when generating the arrays is an obvious option - but not ideal as I need the case info too.

Replies are listed 'Best First'.
Re: (almost) Unique elements of an array
by Tanktalus (Canon) on Jul 14, 2005 at 14:07 UTC

    I'm somewhat confused by wanting to identify unique words regardless of case, while still needing the case information. Perhaps that says you need two hashes - one that uses uc'd1 case, and the other using words as-is.

    1 I seem to recall that upper-casing works in more encodings/languages than lower-casing, and this has nothing to do with perl. There just aren't lower-case equivalents of all upper-case characters in all languages. There are some characters which are defined to be "upper-case" even though there is no "lower-case" equivalent. If you're just dealing with English letters, then this distinction is irrelevant to you.

      Anonymous seems to be saying that he doesn't want to have to modify the array itself. I think having one hash:my %uniq = map {(uc($_) => undef)} @ar; is what is called for.

      Caution: Contents may have been coded under pressure.
        But then you'd lose the original case information. You'd have to have a hash of arrays to preserve the different case variations.

        Update:
        Here's a way to do it with a hash of hashes:
        #!/usr/bin/perl use warnings; use strict; use Data::Dumper; my @original_array = qw( THIS This That That ThAT ThAt ThAt THIS These + Those ); my %results; map { $results{uc($_)}->{$_} = undef } @original_array; print "Original\n", Dumper(@original_array), "\n"; print "Result\n", Dumper( %results ), "\n"; __END__ Original $VAR1 = 'THIS'; $VAR2 = 'This'; $VAR3 = 'That'; $VAR4 = 'That'; $VAR5 = 'ThAT'; $VAR6 = 'ThAt'; $VAR7 = 'ThAt'; $VAR8 = 'THIS'; $VAR9 = 'These'; $VAR10 = 'Those'; Result $VAR1 = 'THAT'; $VAR2 = { 'ThAT' => undef, 'ThAt' => undef, 'That' => undef }; $VAR3 = 'THIS'; $VAR4 = { 'THIS' => undef, 'This' => undef }; $VAR5 = 'THOSE'; $VAR6 = { 'Those' => undef }; $VAR7 = 'THESE'; $VAR8 = { 'These' => undef };
        That's brilliant - excellent to know about the fact it's better to go to uppercase to do this too - thanks.

        It leaves me wondering if there is any reason to use a hash to store a big list of words over an array and is it possible or sensible to turn %uniq into @uniq.

Re: (almost) Unique elements of an array
by jdporter (Paladin) on Jul 14, 2005 at 14:54 UTC
    My reply in the recent thread removing non-duplicates can easily be modified to work case-insenstively.
    my @a; # the data my %count; $count{lc $_}++ for @a; my @ic_unique = grep { $count{lc $_} == 1 } @a; # ic = "ignore case" print for @ic_unique; # or whatever you do with the result
Re: (almost) Unique elements of an array
by eric256 (Parson) on Jul 14, 2005 at 14:37 UTC

    Just for kicks here is a perl6 version that uses case...looks much like perl5 so i made it a function.

    use v6; # response to 474872 (remove duplicates regardless of case) sub remove_dups (@array is copy) { my %unique; %unique{uc($_)} = 1 for @array; return %unique.keys; } my @array = ('Hello','hello'); remove_dups(@array).say; @array.say;

    ___________
    Eric Hodges
Re: (almost) Unique elements of an array
by kprasanna_79 (Hermit) on Jul 14, 2005 at 14:26 UTC
    I think this code helps u to solve the issue
    my @array = qw/THIS this there THERE prasanna/; %temp = (); map{$temp{lc($_)}++}@array; for (sort keys %temp) { print "$_=>$temp{$_}"."\n"; }

    -prasanna.K
Re: (almost) Unique elements of an array
by anonymized user 468275 (Curate) on Jul 14, 2005 at 14:50 UTC
    No need to lowercase the array, just the 'work-out' hash keys, if we assume this simple approach:-
    # @input assumed by statement of requirement my %done = (); my @output = (); for ( @input ) { unless( $done{ lc( $_ ) } ) { push @output, $_; $done{ lc( $_ ) } = 1; } }

    One world, one people

Re: (almost) Unique elements of an array
by Eimi Metamorphoumai (Deacon) on Jul 14, 2005 at 14:53 UTC
    Here's a solution that keeps retains the original order and the original case. Basically, it removes any later occurrences, regardless of case, but keeps the case of the original.
    print join ", ", unique_case(qw( THIS This That That ThAT ThAt ThAt THIS These Those )); sub unique_case { my %seen; return grep {!$seen{uc $_}++} @_; }
Re: (almost) Unique elements of an array
by Anonymous Monk on Jul 14, 2005 at 15:57 UTC
    #!/usr/bin/perl -w use strict; my @ar = qw/This this is a test test TEsT/; my @vals = uniq(@ar); $\ = "\n"; print (join " ", @vals); sub uniq { my %hash; return grep {!($hash{uc $_}++)} @_; }