http://qs1969.pair.com?node_id=61589

r.joseph has asked for the wisdom of the Perl Monks concerning the following question:

I know that this is a stupid question, but I seriously cannot seem to figure this out. I have some data that look like the data below (actually, this is the real data):
<code>

Brentwood
Brentwood
Westwood
Beverly Hills
Beverly Hills
Beverly Hills
Bev Hills Adj.
Palms
Mar Vista
Mar Vista
Mar Vista
Mar Vista
Mar Vista
Venice
Santa Monica
Santa Monica
LA/Crescent
LA/Crescent
Koreatown
3rd/LaBrea
3rd/LaBrea
3rd/LaBrea
Now, as you can see, it isn't much. What I am trying to do - and this is probably really simple, but for the life of me I can't figure it out - is to take all the duplicate fields and end up with an array that has just one copy of each piece of data, so that the array is just: Brentwood, Westwood, Beverly Hills...you get the idead


So, as you can see, my problem is not a large one, and for the enlightened minds of the monestary, probably a very simple one. However, I have been pulling my hair out trying to get it to work.

Thanks so Much!
R.Joseph

Replies are listed 'Best First'.
Re: Detecting duplicate entries
by kilinrax (Deacon) on Mar 01, 2001 at 21:56 UTC
    Slurp it into a hash? ;-)
    #!/usr/bin/perl -w use strict; my %unique; while (<DATA>) { chomp; $unique{$_} = ''; } my @array = keys %unique; $, ="\n"; print @array; __DATA__ Brentwood Brentwood Westwood Beverly Hills Beverly Hills Beverly Hills Bev Hills Adj. Palms Mar Vista Mar Vista Mar Vista Mar Vista Mar Vista Venice Santa Monica Santa Monica LA/Crescent LA/Crescent Koreatown 3rd/LaBrea 3rd/LaBrea 3rd/LaBrea
    (of course, if you want to preserve order too, some extra cunningness will be required; but the basic idea remains the same)
Re: Detecting duplicate entries
by davorg (Chancellor) on Mar 01, 2001 at 22:09 UTC

    The hash solutions that have already been given have the right idea, but you can use hash slices to make it look a bit simpler.

    my %uniq; @uniq{@orig_list} = @orig_list; my @unique_list = keys %uniq;
    --
    <http://www.dave.org.uk>

    "Perl makes the fun jobs fun
    and the boring jobs bearable" - me

      Don't assign values, just assign ().

      Or use:

      $seen{$_}++ or push @uniq, $_ for @orig; # or @uniq = grep !$seen{$_}++, @orig;


      japhy -- Perl and Regex Hacker
      And probably even slower, but it avoids the %temp and it's cute:
      sort keys %{ { map { $_ => 1 } @orig_list }

      p
(ichimunki) re: Detecting duplicate entries
by ichimunki (Priest) on Mar 01, 2001 at 21:59 UTC
    YMMV. TIMTOWTDI. etc.
    my @data = ( # all that stuff # ); my %hash; foreach (@data) { $hash{$_} = 1; } my @new_data = keys %hash;
Re: Detecting duplicate entries
by zigster (Hermit) on Mar 01, 2001 at 21:58 UTC
    Look at the shell command uniq or the node == uniq there are a whole pile of alternative solutions.
    --

    Zigster
Re: Detecting duplicate entries
by I0 (Priest) on Mar 02, 2001 at 04:14 UTC
      or perldoc -q unique if you have a version older than 5.6
      snowcrash //////
Re: Detecting duplicate entries
by arhuman (Vicar) on Mar 01, 2001 at 21:59 UTC
    Just in case you didn't know, under Unix :

    cat file.txt | sort | uniq
    will give you the content of your file file.txt with all duplicates deleted (and sorted by the way...)

    UPDATE : Once again I was too long typing, ignore this post as zigster gave this answer faster than me...
        And to do it all at once as an inline edit:
        sort -uo file.txt file.txt

        Update: For those that asked, sort -u file.txt > file.txt
        will completely and happily change file.txt into a zero byte file, hence the need for the "o" flag.

      ouch ! Didn't know this sort switch !
      Thanks.

      (Oh my god, I just realized how many useless char I typed during all those years...)