Re: perl to remove duplacate based on columnb,d &

You could try something like this:

#!/usr/bin/perl
use warnings;
use strict;

my %uniq;

while (<DATA>) {
    next unless /^\s*\d/;
    chomp;
    my $line = $_;
    
    my @f = split /,/, $line;
    my $key = $f[1].$f[2].$f[3];

    if ( exists $uniq{$key} ) {
        my $stored = ( split /,/, $uniq{$key})[4];
        my $new    = $f[4];
        if ($new lt $stored) {
            $uniq{$key} = $line;
        }
    }
    else {
        $uniq{$key} = $line;
    }
}

print $_."\n" for (values %uniq);

__DATA__
1,ken,james,smith,s
11,ken,james,smith,f
0,ken,james,smith,s
5,ken,arthur,wesson,g
7,ken,arthur,wesson,a
[download]

For the provided DATA section, it produces the following output:

11,ken,james,smith,f
7,ken,arthur,wesson,a
[download]

Which should be the behavior you want.

Consider looking at dedicated CSV modules, like Text::CSV_XS.

It's already 2015 in my time zone, and so I wish you all the best in 2015. May your code produce the output you desire, and your input be as you think it is.

- Luke

Comment on Re: perl to remove duplacate based on columnb,d & Select or Download Code