filter array based on 2 fields

webster has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: filter array based on 2 fields by Anonymous Monk on Aug 20, 2012 at 13:01 UTC
Hi I have an array like this (client,status,description) Can you show us? o that if there are duplicates they are removed so there is only one entry left in the new array. This is perlfaq4# How can I remove duplicate elements from a list or array? Welcome, see The Perl Monks Guide to the Monastery	[reply]
Re^2: filter array based on 2 fields by Anonymous Monk on Aug 20, 2012 at 18:10 UTC
Hi it's an array like this (client,Status,Description client,Status,Description and so on) it's lines for a csv file I need to remove duplicates based on client and status as description can vary because of a timestamp Thanks Thorbjorn	[reply]
Re: filter array based on 2 fields by BillKSmith (Monsignor) on Aug 20, 2012 at 13:40 UTC
You have not told us enough about your data. If your data is an array of strings with comma separated fields, The reference to FAQ4 is probably correct. More likely, you have an array of array references. Each reference refers to an anonymous array of the form you show. Now, "duplicate" is less clear because duplicate arrays and duplicate references are not the same thing. In any case, you probably mean to remove all but one a duplicated entry, making it unique. You could mean to remove all duplicate entries, leaving only those that were already unique. Bill	[reply]
Re: filter array based on 2 fields by cheekuperl (Monk) on Aug 20, 2012 at 13:58 UTC
`@arr=("c1,s1,d1","c2,s2,d2","c1,s1,d2","c1,s2,d3"); print "\nWith duplicates: \n". join("\n",@arr); foreach $elem (@arr) { $elem=~m/(\w+,\w+)(,\w+)/; #Input assumed sane $seen{$1}=$2; } #Now you have a hash with unique (client,status) combos #Rebuild your array @arr=(); print "\n"; #Rebuild you array, without duplicate entries foreach $key (keys %seen) { push(@arr,$key . $seen{$key}); } #Note that order of elemens varies because of hash print "\nWith duplicates removed: \n". join("\n",@arr);` [download]	[reply] [d/l]
Re^2: filter array based on 2 fields by hbm (Hermit) on Aug 20, 2012 at 14:30 UTC
More simply - and keeping the order: `@arr = ('c1,s1,d1','c2,s2,d2','c1,s1,d2','c1,s2,d3'); @arr = grep{ /(.+),/; !$seen{$1}++ } @arr; print join$/,@arr;` [download] Prints: `c1,s1,d1 c2,s2,d2 c1,s2,d3` [download]	[reply] [d/l] [select]
Re^3: filter array based on 2 fields by Anonymous Monk on Aug 20, 2012 at 17:40 UTC
Hi it's an array like this (client,Status,Description client,Status,Description and so on) it's lines for a csv file I need to remove duplicates based on client and status as description can vary because of a timestamp Thanks Thorbjorn	[reply]
Re^4: filter array based on 2 fields by Anonymous Monk on Aug 20, 2012 at 17:43 UTC
Re: filter array based on 2 fields by locked_user sundialsvc4 (Abbot) on Aug 20, 2012 at 18:32 UTC
I suggest that you re-structure your data so that the array contains hashrefs each one of which contains three keys. For example... `$foo = [ { client => "Foo", status => "Alive", description => "Bletch" }, { client => "Bar", status => "Frobozzed", description => "Plugh" } ]` [download] Now, what does that buy you? A lot, actually. Each element in the array is now one thing, not three things, and furthermore it is a reference to that “one thing.” There can be as many references-to it as you wish, and each reference requires little additional storage. You can `sort` an array, specifying a comparison-function for use in comparing the elements, and the `<=>` operator was designed just for you. It is easy to find duplicates this way, and to push all of the non-dupe hashrefs onto a separate list ... and notice that you are not duplicating the record; only making another reference to what is already there. The entire 3-tuple of related values stays together, and it survives in memory for so long as at least one reference to it continues to exist.