Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi

If i wanted to build up a list of ids by extracting the ids from lines read in from a file, and then remove any duplicates from this list so I have a list of unique ids separated by commas, what would be the best way to do this? i haven't created by list of ids yet so I can do it in any way recommended.

many thanks

Replies are listed 'Best First'.
Re: removing duplicates from a string list
by moritz (Cardinal) on Nov 27, 2010 at 20:27 UTC
Re: removing duplicates from a string list
by toolic (Bishop) on Nov 27, 2010 at 22:50 UTC
    This is a perlfaq, also available from your command line:
    perldoc -q duplicate
Re: removing duplicates from a string list
by ww (Archbishop) on Nov 27, 2010 at 20:27 UTC
    Think hash!

    That's a good 'general rule' when seeking to eliminate dups; when trying for a list of uniques.

      do you mean something like this
      %ids; #loop through lines #add id to the list $ids{$my_input_id} = 'anything - doesn't matter'; #end loop @ids = keys %ids join(',' @ids);
      or this
      %ids; #loop through lines #add id to the list $ids{$my_input_id} = 'id'; #end loop @ids = values %ids join(',' @ids);
      or does it make no difference?

        Always use strict and warnings, so you would have to declare your variable with my.

        With a hash, the key is unique, but not neccessarily the values.

        Peter (Guo) Pei

        Because the value you assign to each key is unimportant, you might as well use undef to signify that:

        use warnings; use strict; use 5.010; my @arr = qw{ a a b b c c c }; my %hash; @hash{@arr} = undef; use Data::Dumper; say Dumper(\%hash); --output:-- $VAR1 = { 'c' => undef, 'a' => undef, 'b' => undef }; ----------- if (exists $hash{a} ) { say 'yes'; } else { say 'no'; } --output:-- yes ---------- for (keys %hash) { say; } --output:-- c a b

        If you are reading lines from a file you can reduce the amount of memory you use at any one time by assigning the lines to the hash one at a time--rather than storing the lines in an array and then doing a gang assignment like above.

Re: removing duplicates from a string list
by chrestomanci (Priest) on Nov 27, 2010 at 21:57 UTC

    See also List::MoreUtils

    From the pod:

    use List::MoreUtils qw( uniq ); my @x = uniq 1, 1, 2, 2, 3, 5, 3, 4; # @x now contains 1 2 3 5 4

    This method has the advantage that it maintains the order of your original list. A simple hash method will shuffle the order.

Re: removing duplicates from a string list
by Marshall (Canon) on Nov 28, 2010 at 01:48 UTC
    #!/usr/bin/perl -w use strict; my @ids = qw(12 34 56 34 89 12 35); my %seen=(); my @unique = grep{!$seen{$_}++}@ids; print join(",",@unique),"\n"; #prints: 12,34,56,89,35