Re^2: Find the row with shortest string for a given input in a csv file.

Do not use parse (it'll break your script on fields with newlines). Use getline instead!

Use auto_diag

I seriously doubt if all the whitespace should be counted in the length function

use 5.12.2;
use warnings;
use Text::CSV;

my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1, allow_whitesp
+ace => 1 });
my %results;

while (my $row = $csv->getline (*DATA)) {
    my $uniqueID = $row->[0];
    my $string   = $row->[1];

    $results{$uniqueID}{len} // 9999 <= length $string and next;

    $results{$uniqueID} = {
        len => length $string,
        row => $row,
        };
    }

$csv->eol ("\n");
$csv->print (*STDOUT, $results{$_}{row}) for sort keys %results;

__DATA__
A, texttexttext, col3, col4,
B, textt,        col3, col4,
A, text,         col3, col4,
B, texttex,      col3, col4,
[download]

Enjoy, Have FUN! H.Merijn

Comment on Re^2: Find the row with shortest string for a given input in a csv file. Select or Download Code

Replies are listed 'Best First'.
Re^3: Find the row with shortest string for a given input in a csv file. by AppleFritter (Vicar) on Jul 28, 2014 at 18:34 UTC
Do not use parse (it'll break your script on fields with newlines). Use getline instead! Ah, good point. Funny, my first iteration of the script actually used `->getline()`, but then I reckoned that in `$csv->getline(DATA)` couldn't be generalized so easily to the magic filehandle. I didn't want to sacrifice the convenience of not having to explicitely open files; the issue with newlines didn't occur to me, but you're right. The devil is in the details... Looking at perlop now, it also turns out that `<>` is actually just a shorthand for `<ARGV>` (which is just as magic): you can* write `$csv->getline(*ARGV)` and still have everything Just Work™, both piping data into the script and supplying a filename (or several) on the command line. Thanks for enlightening me, brother!	[reply]

Replies are listed 'Best First'.

Re^3: Find the row with shortest string for a given input in a csv file.
by AppleFritter (Vicar) on Jul 28, 2014 at 18:34 UTC

Do not use parse (it'll break your script on fields with newlines). Use getline instead!

Ah, good point. Funny, my first iteration of the script actually used ->getline(), but then I reckoned that in $csv->getline(*DATA) couldn't be generalized so easily to the magic filehandle. I didn't want to sacrifice the convenience of not having to explicitely open files; the issue with newlines didn't occur to me, but you're right. The devil is in the details...

Looking at perlop now, it also turns out that <> is actually just a shorthand for <ARGV> (which is just as magic): you can write $csv->getline(*ARGV) and still have everything Just Work™, both piping data into the script and supplying a filename (or several) on the command line.

Thanks for enlightening me, brother!

[reply]