webster has asked for the wisdom of the Perl Monks concerning the following question:

Hi I have an array like this (client,status,description) I would like to filter it based on client and status so that if there are duplicates they are removed so there is only one entry left in the new array. Thanks Thorbjorn

Replies are listed 'Best First'.
Re: filter array based on 2 fields
by Anonymous Monk on Aug 20, 2012 at 13:01 UTC

      Hi
      it's an array like this
      (client,Status,Description
      client,Status,Description
      and so on)
      it's lines for a csv file I need to remove duplicates based on client and status as description can vary because of a timestamp
      Thanks Thorbjorn

Re: filter array based on 2 fields
by BillKSmith (Monsignor) on Aug 20, 2012 at 13:40 UTC

    You have not told us enough about your data. If your data is an array of strings with comma separated fields, The reference to FAQ4 is probably correct. More likely, you have an array of array references. Each reference refers to an anonymous array of the form you show. Now, "duplicate" is less clear because duplicate arrays and duplicate references are not the same thing.

    In any case, you probably mean to remove all but one a duplicated entry, making it unique. You could mean to remove all duplicate entries, leaving only those that were already unique.

    Bill
Re: filter array based on 2 fields
by cheekuperl (Monk) on Aug 20, 2012 at 13:58 UTC
    @arr=("c1,s1,d1","c2,s2,d2","c1,s1,d2","c1,s2,d3"); print "\nWith duplicates: \n". join("\n",@arr); foreach $elem (@arr) { $elem=~m/(\w+,\w+)(,\w+)/; #Input assumed sane $seen{$1}=$2; } #Now you have a hash with unique (client,status) combos #Rebuild your array @arr=(); print "\n"; #Rebuild you array, without duplicate entries foreach $key (keys %seen) { push(@arr,$key . $seen{$key}); } #Note that order of elemens varies because of hash print "\nWith duplicates removed: \n". join("\n",@arr);

      More simply - and keeping the order:

      @arr = ('c1,s1,d1','c2,s2,d2','c1,s1,d2','c1,s2,d3'); @arr = grep{ /(.+),/; !$seen{$1}++ } @arr; print join$/,@arr;

      Prints:

      c1,s1,d1 c2,s2,d2 c1,s2,d3
        Hi it's an array like this (client,Status,Description client,Status,Description and so on) it's lines for a csv file I need to remove duplicates based on client and status as description can vary because of a timestamp Thanks Thorbjorn
Re: filter array based on 2 fields
by locked_user sundialsvc4 (Abbot) on Aug 20, 2012 at 18:32 UTC

    I suggest that you re-structure your data so that the array contains hashrefs each one of which contains three keys.   For example...

    $foo = [ { client => "Foo", status => "Alive", description => "Bletch" }, { client => "Bar", status => "Frobozzed", description => "Plugh" } ]

    Now, what does that buy you?   A lot, actually.   Each element in the array is now one thing, not three things, and furthermore it is a reference to that “one thing.”   There can be as many references-to it as you wish, and each reference requires little additional storage.

    You can sort an array, specifying a comparison-function for use in comparing the elements, and the <=> operator was designed just for you.   It is easy to find duplicates this way, and to push all of the non-dupe hashrefs onto a separate list ... and notice that you are not duplicating the record; only making another reference to what is already there.   The entire 3-tuple of related values stays together, and it survives in memory for so long as at least one reference to it continues to exist.