Read the file using a CPAN module (e.g. Text::CSV), and keep a hash or array that, for each unique ID, records the length of the string in column 2, and the entire corresponding row. (A hash would be more natural, I think, since you could index it by unique ID; an array would allow you to easily preserve the ordering of rows from the original file, in case that's important).
Here's a hash-based solution:
#!/usr/bin/perl
use strict;
use warnings;
use feature qw/say/;
use Text::CSV;
my $csv = Text::CSV->new( { binary => 1 })
or die "Cannot use CSV" . Text::CSV->error_diag();
my %results = ();
while(<DATA>) {
chomp;
$csv->parse($_) or die "Could not parse string '$_'" . Text::CSV->
+error_diag();
my @row = $csv->fields();
my $uniqueID = $row[0];
my $string = $row[1];
if(!exists $results{$uniqueID} or $results{$uniqueID}->{'length'}
+> length $string) {
$results{$uniqueID} = {
'length' => length $string,
'row' => $_
};
}
}
foreach (sort keys %results) {
say $results{$_}->{'row'};
}
__DATA__
A, texttexttext, col3, col4,
B, textt, col3, col4,
A, text, col3, col4,
B, texttex, col3, col4,
I'm reading from __DATA__ here; to use an external file, simply use the magic filehandle, <>, instead of <DATA>. This'll allow you to specify files on the command line as well as pipe them into the script:
$ perl script.pl data.csv
...
$ generate_csv | perl script.pl
...
$
Side note -- I see you crossposted your question to StackOverflow. That's fine, of course, but it's generally considered polite to inform people of crossposting to avoid duplicated/unnecessary effort.
|