in reply to Re: Find the row with shortest string for a given input in a csv file.
in thread Find the row with shortest string for a given input in a csv file.

Do not use parse (it'll break your script on fields with newlines). Use getline instead!

Use auto_diag

I seriously doubt if all the whitespace should be counted in the length function

use 5.12.2; use warnings; use Text::CSV; my $csv = Text::CSV->new ({ binary => 1, auto_diag => 1, allow_whitesp +ace => 1 }); my %results; while (my $row = $csv->getline (*DATA)) { my $uniqueID = $row->[0]; my $string = $row->[1]; $results{$uniqueID}{len} // 9999 <= length $string and next; $results{$uniqueID} = { len => length $string, row => $row, }; } $csv->eol ("\n"); $csv->print (*STDOUT, $results{$_}{row}) for sort keys %results; __DATA__ A, texttexttext, col3, col4, B, textt, col3, col4, A, text, col3, col4, B, texttex, col3, col4,

Enjoy, Have FUN! H.Merijn

Replies are listed 'Best First'.
Re^3: Find the row with shortest string for a given input in a csv file.
by AppleFritter (Vicar) on Jul 28, 2014 at 18:34 UTC

    Do not use parse (it'll break your script on fields with newlines). Use getline instead!

    Ah, good point. Funny, my first iteration of the script actually used ->getline(), but then I reckoned that in $csv->getline(*DATA) couldn't be generalized so easily to the magic filehandle. I didn't want to sacrifice the convenience of not having to explicitely open files; the issue with newlines didn't occur to me, but you're right. The devil is in the details...

    Looking at perlop now, it also turns out that <> is actually just a shorthand for <ARGV> (which is just as magic): you can write $csv->getline(*ARGV) and still have everything Just Work™, both piping data into the script and supplying a filename (or several) on the command line.

    Thanks for enlightening me, brother!