The following code automatically detects numeric fields
or even fields of mixed text and numbers and sorts them
properly (one of my favorite tricks). Note that it doesn't
handle quoted fields that contain commas (or even where
some lines have the field quoted and some don't and where the
quotes are not supposed to affect the sort order). Replace
the simple split/\s*,\s*/ with a use of the
CSV module if you have that kind of data.
#!/usr/bin/perl -w
use strict;
die "Usage: $0 col[,col[...]] [file[,...]]\n" unless @ARGV;
my @cols= map { $_-1 } split/,/,shift;
my @lines= <>;
my @sort= map { my $x=join"\0"x5,(split/\s*,\s*/)[@cols];
$x =~ s/(^|[^\d.])(\d+)/$1.pack("N",$2)/eg; $x } @lines;
print @lines[ sort { $sort[$a] cmp $sort[$b] } 0..$#sort ];
Note that this code explicitly avoids using nifty nested
map tricks because they tend to slow things
down. For example, the code above was over twice as fast
as the following sexier code in my large-file tests:
die "Usage: $0 col[,col[...]] [file[,...]]\n" unless @ARGV;
my @cols= map { $_-1 } split/,/,shift;
print map { $_->[1] }
sort { $a->[0] cmp $b->[0] }
map { my $x=join"\0\0\0\0",(split/\s*,\s*/)[@cols];
s/(^|[^\d.])(\d+)/$1.pack("N",$2)/eg; [$x,$_] } <>;
P.S. The reason that this nested-map version is slow is
not because I don't have tilly's illustrious
patch (just to counter tilly's down-playing of how
neat his patch is). Those are all 1-to-1 maps. (:
P.P.S. I think that this is a Schwartzian Transform, but I
wasn't sure I'd done it right and didn't want to mislabel
it. :) Update: While I was typing, an example of a
Schwartzian Transform was posted just above and, other than
mixing 1 and 0, I did write one.
-
tye
(my smileys are ambidextrous!) |