Delimited File Analysis One-Liner?

temporal has asked for the wisdom of the Perl Monks concerning the following question:

Sometimes I want to find the longest string under a particular column within a rather large, unsorted CSV file (several GB).

This can be useful to know for specifying sane database column size restrictions, etc. I've also found myself doing variations on this theme - wanting to do a quick and dirty comparison or operation based on an attribute of a particular column or columns in a delimited file.

Typically I'll do something like this:

perl -F, -lane 'print $t = length $F[0] <= $t ? next LINE : length $F[0]' file.csv

Just wondering if anyone has a cleaner one-liner to accomplish this. There's got to be a sexier way. Or maybe just a more efficient way. Any ideas?

Strange things are afoot at the Circle-K.

Comment on Delimited File Analysis One-Liner? Download Code

Replies are listed 'Best First'.
Re: Delimited File Analysis One-Liner? by BrowserUk (Patriarch) on May 01, 2012 at 20:51 UTC
My guess is that this would be fastest for your specific example: `perl -nE"$l=index$_,',';$m<$l and $m=$l}{ say $m"` [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply] [d/l]
Re^2: Delimited File Analysis One-Liner? by temporal (Pilgrim) on May 01, 2012 at 21:17 UTC
Thanks BrowserUk, that's a clever way to do it! Didn't know that Perl allows you to close the brackets like that, either. Strange things are afoot at the Circle-K.	[reply]
Re^3: Delimited File Analysis One-Liner? by BrowserUk (Patriarch) on May 01, 2012 at 21:30 UTC
Didn't know that Perl allows you to close the brackets like that, either. Do a super search for "secret operators" and "eskimo greeting".	[reply]
Re^2: Delimited File Analysis One-Liner? by temporal (Pilgrim) on May 02, 2012 at 15:21 UTC
Thanks for the tip, BrowserUk. Always fun to learn a new trick. I've generalized your code to match any column, where i is the column: `perl -F, -anE '$m<($x=length $F[i]) and $m=$x}{say $m' file.csv` I'd still use yours if I'm only looking in the first column. I wonder if there's a way to continue on that same idea (using index) and count delimiters out to a particular column and then save the distance between the most recent delimiters. Probably wouldn't be a one-liner at that point. Also, curious about your use of $1. Is it a shell variable? Executing your command as written (in bash) doesn't give me any output, I have to switch the single and double quotes. Then I have to use a different variable, Perl won't let me assign to $1. Strange things are afoot at the Circle-K.	[reply] [d/l]
Re^3: Delimited File Analysis One-Liner? by BrowserUk (Patriarch) on May 02, 2012 at 16:19 UTC
I wonder if there's a way to continue on that same idea (using index) and count delimiters out to a particular column No. Beyond the first column, `-aF,` is about as efficient as it gets. Also, curious about your use of $1. You need to get a better font! It isn't `$1`, (one) but rather `$l` (for length) and `$m` (for max). With any reasonable font they should be distinct, but I see it was a bad chioce for posting here. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply] [d/l] [select]
Re^4: Delimited File Analysis One-Liner? by temporal (Pilgrim) on May 02, 2012 at 19:13 UTC
Re: Delimited File Analysis One-Liner? by Anonymous Monk on May 01, 2012 at 20:33 UTC
`perl -lne 'print $t = do{ /,/; $l = $-[0] } <= $t ? next : $l'` Efficient? - Yes. Sexier? - Not really.	[reply] [d/l]
Re^2: Delimited File Analysis One-Liner? by Anonymous Monk on May 02, 2012 at 08:40 UTC
A faster way: `perl -lne 'print $t = $l if ($l=index $_,q{,}) > $t' file.csv`	[reply] [d/l]