Read the file using a CPAN module (e.g. Text::CSV), and keep a hash or array that, for each unique ID, records the length of the string in column 2, and the entire corresponding row. (A hash would be more natural, I think, since you could index it by unique ID; an array would allow you to easily preserve the ordering of rows from the original file, in case that's important).
Here's a hash-based solution:
#!/usr/bin/perl use strict; use warnings; use feature qw/say/; use Text::CSV; my $csv = Text::CSV->new( { binary => 1 }) or die "Cannot use CSV" . Text::CSV->error_diag(); my %results = (); while(<DATA>) { chomp; $csv->parse($_) or die "Could not parse string '$_'" . Text::CSV-> +error_diag(); my @row = $csv->fields(); my $uniqueID = $row[0]; my $string = $row[1]; if(!exists $results{$uniqueID} or $results{$uniqueID}->{'length'} +> length $string) { $results{$uniqueID} = { 'length' => length $string, 'row' => $_ }; } } foreach (sort keys %results) { say $results{$_}->{'row'}; } __DATA__ A, texttexttext, col3, col4, B, textt, col3, col4, A, text, col3, col4, B, texttex, col3, col4,
I'm reading from __DATA__ here; to use an external file, simply use the magic filehandle, <>, instead of <DATA>. This'll allow you to specify files on the command line as well as pipe them into the script:
$ perl script.pl data.csv ... $ generate_csv | perl script.pl ... $
Side note -- I see you crossposted your question to StackOverflow. That's fine, of course, but it's generally considered polite to inform people of crossposting to avoid duplicated/unnecessary effort.
In reply to Re: Find the row with shortest string for a given input in a csv file.
by AppleFritter
in thread Find the row with shortest string for a given input in a csv file.
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |