perl to remove duplacate based on columnb,d &

john.tm has asked for the wisdom of the Perl Monks concerning the following question:

I have a comma seperated file, and wish to remove duplicate lines based on column b,c & d, but keeping the one with the lower value letter ( in my case F not S ) from column E.

input
1,ken,james,smith,s
11,ken,james,smith,f
0,ken,james,smith,s

output
11,ken,james,smith,f
[download]

seek $ifh, 0, 0;


my @file = <$ifh>;

my @array;
my %hash;
foreach my $_ (reverse @file) {


    chomp;
    next
        if  ! m/^\s+\d/;
    s/^\s+//g;
    s/\s+$//g;
    s/\s+/,/g;

    my $key = join ',', ( split /,/ )[ 1, 2, 3 ]; # remove duplicates 
+column b,c,d
   #push @array, $_
    
     print  $_, "\n"
        if  ! $hash{$key}++;
}
[download]

Comment on perl to remove duplacate based on columnb,d & Select or Download Code

Replies are listed 'Best First'.
Re: perl to remove duplacate based on columnb,d & by blindluke (Hermit) on Jan 01, 2015 at 00:08 UTC
You could try something like this: `#!/usr/bin/perl use warnings; use strict; my %uniq; while (<DATA>) { next unless /^\s*\d/; chomp; my $line = $_; my @f = split /,/, $line; my $key = $f[1].$f[2].$f[3]; if ( exists $uniq{$key} ) { my $stored = ( split /,/, $uniq{$key})[4]; my $new = $f[4]; if ($new lt $stored) { $uniq{$key} = $line; } } else { $uniq{$key} = $line; } } print $_."\n" for (values %uniq); __DATA__ 1,ken,james,smith,s 11,ken,james,smith,f 0,ken,james,smith,s 5,ken,arthur,wesson,g 7,ken,arthur,wesson,a` [download] For the provided DATA section, it produces the following output: `11,ken,james,smith,f 7,ken,arthur,wesson,a` [download] Which should be the behavior you want. Consider looking at dedicated CSV modules, like Text::CSV_XS. It's already 2015 in my time zone, and so I wish you all the best in 2015. May your code produce the output you desire, and your input be as you think it is. - Luke	[reply] [d/l] [select]
Re: perl to remove duplacate based on columnb,d & by Anonymous Monk on Jan 01, 2015 at 05:39 UTC
Hi, What should hsppen if you don't get an 'f' line, or if you get 2? You may be told that this will never happen. This generally means that it will happen within a coupole of weeks, at most. J.C.	[reply]