(almost) Unique elements of an array

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: (almost) Unique elements of an array by Tanktalus (Canon) on Jul 14, 2005 at 14:07 UTC
I'm somewhat confused by wanting to identify unique words regardless of case, while still needing the case information. Perhaps that says you need two hashes - one that uses uc'd¹ case, and the other using words as-is. ¹ I seem to recall that upper-casing works in more encodings/languages than lower-casing, and this has nothing to do with perl. There just aren't lower-case equivalents of all upper-case characters in all languages. There are some characters which are defined to be "upper-case" even though there is no "lower-case" equivalent. If you're just dealing with English letters, then this distinction is irrelevant to you.	[reply]
Re^2: (almost) Unique elements of an array by Roy Johnson (Monsignor) on Jul 14, 2005 at 14:17 UTC
Anonymous seems to be saying that he doesn't want to have to modify the array itself. I think having one hash:`my %uniq = map {(uc($_) => undef)} @ar;` is what is called for. Caution: Contents may have been coded under pressure.	[reply] [d/l]
Re^3: (almost) Unique elements of an array by Transient (Hermit) on Jul 14, 2005 at 14:23 UTC
But then you'd lose the original case information. You'd have to have a hash of arrays to preserve the different case variations. Update: Here's a way to do it with a hash of hashes: #!/usr/bin/perl use warnings; use strict; use Data::Dumper; my @original_array = qw( THIS This That That ThAT ThAt ThAt THIS These + Those ); my %results; map { $results{uc($_)}->{$_} = undef } @original_array; print "Original\n", Dumper(@original_array), "\n"; print "Result\n", Dumper( %results ), "\n"; __END__ Original $VAR1 = 'THIS'; $VAR2 = 'This'; $VAR3 = 'That'; $VAR4 = 'That'; $VAR5 = 'ThAT'; $VAR6 = 'ThAt'; $VAR7 = 'ThAt'; $VAR8 = 'THIS'; $VAR9 = 'These'; $VAR10 = 'Those'; Result $VAR1 = 'THAT'; $VAR2 = { 'ThAT' => undef, 'ThAt' => undef, 'That' => undef }; $VAR3 = 'THIS'; $VAR4 = { 'THIS' => undef, 'This' => undef }; $VAR5 = 'THOSE'; $VAR6 = { 'Those' => undef }; $VAR7 = 'THESE'; $VAR8 = { 'These' => undef }; [download]	[reply] [d/l]
Re^4: (almost) Unique elements of an array by Roy Johnson (Monsignor) on Jul 14, 2005 at 15:12 UTC
Re^3: (almost) Unique elements of an array by Anonymous Monk on Jul 14, 2005 at 14:43 UTC
That's brilliant - excellent to know about the fact it's better to go to uppercase to do this too - thanks. It leaves me wondering if there is any reason to use a hash to store a big list of words over an array and is it possible or sensible to turn %uniq into @uniq.	[reply]
Re^4: (almost) Unique elements of an array by tlm (Prior) on Jul 14, 2005 at 15:03 UTC
Re: (almost) Unique elements of an array by jdporter (Paladin) on Jul 14, 2005 at 14:54 UTC
My reply in the recent thread removing non-duplicates can easily be modified to work case-insenstively. `my @a; # the data my %count; $count{lc $_}++ for @a; my @ic_unique = grep { $count{lc $_} == 1 } @a; # ic = "ignore case" print for @ic_unique; # or whatever you do with the result` [download]	[reply] [d/l]
Re: (almost) Unique elements of an array by eric256 (Parson) on Jul 14, 2005 at 14:37 UTC
Just for kicks here is a perl6 version that uses case...looks much like perl5 so i made it a function. `use v6; # response to 474872 (remove duplicates regardless of case) sub remove_dups (@array is copy) { my %unique; %unique{uc($_)} = 1 for @array; return %unique.keys; } my @array = ('Hello','hello'); remove_dups(@array).say; @array.say;` [download] ___________ Eric Hodges	[reply] [d/l]
Re: (almost) Unique elements of an array by kprasanna_79 (Hermit) on Jul 14, 2005 at 14:26 UTC
I think this code helps u to solve the issue `my @array = qw/THIS this there THERE prasanna/; %temp = (); map{$temp{lc($_)}++}@array; for (sort keys %temp) { print "$_=>$temp{$_}"."\n"; }` [download] -prasanna.K	[reply] [d/l]
Re: (almost) Unique elements of an array by anonymized user 468275 (Curate) on Jul 14, 2005 at 14:50 UTC
No need to lowercase the array, just the 'work-out' hash keys, if we assume this simple approach:- `# @input assumed by statement of requirement my %done = (); my @output = (); for ( @input ) { unless( $done{ lc( $_ ) } ) { push @output, $_; $done{ lc( $_ ) } = 1; } }` [download] One world, one people	[reply] [d/l]
Re: (almost) Unique elements of an array by Eimi Metamorphoumai (Deacon) on Jul 14, 2005 at 14:53 UTC
Here's a solution that keeps retains the original order and the original case. Basically, it removes any later occurrences, regardless of case, but keeps the case of the original. `print join ", ", unique_case(qw( THIS This That That ThAT ThAt ThAt THIS These Those )); sub unique_case { my %seen; return grep {!$seen{uc $_}++} @_; }` [download]	[reply] [d/l]
Re: (almost) Unique elements of an array by Anonymous Monk on Jul 14, 2005 at 15:57 UTC
`#!/usr/bin/perl -w use strict; my @ar = qw/This this is a test test TEsT/; my @vals = uniq(@ar); $\ = "\n"; print (join " ", @vals); sub uniq { my %hash; return grep {!($hash{uc $_}++)} @_; }` [download]	[reply] [d/l]