Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks, I love this site and have learned more about PERL here, than any other website. I was hoping some one could help me sort some arrays by a field. I have tried to apply almost every example for sorting that I could find here on Perlmonks, but I can not get it to work correctly. Sometimes it return a hash, sometimes... well, I'm having a hard time. I have about 40000 thousand lines to sort, but I will only give you a sample of the data, that I suck into an array, from a text file.

I'm not woried about sorting field 0, all I want to do is sort field 4 and print out the results, from highest number to lowest.

open(INF,"look.txt") || print "Cannot open look.txt file\n"; @look = <INF>; close(INF); @out = sort { (split '|', $a, 12)[4] <=> (split '|', $b, 12)[4] } @loo +k; foreach $line (@out) { @fields = @$line; print "@fields[0],@fields[1],@fields[2],@fields[3],@fields[4],"; }

My data looks like this:

accred|143|0|0|412|0|0|0|0|0|0|0|0| accu-b|36|0|0|103|0|38|0|0|0|0|2|0| accua|35|0|0|27|0|37|0|0|0|0|1|0| accur|173|0|0|486|0|0|0|0|0|0|0|0| accura|43|0|0|1022|0|4|0|0|0|0|0|0| accurat|215|0|0|730|0|233|0|0|0|3|0|0| accweb|1|0|0|0|0|38|10|2|0|0|0|0| acdev|0|0|0|0|0|3|0|0|0|0|0|0| aceas|11|0|0|28|0|36|0|0|0|0|0|0| acecok|127|0|0|58|0|37|0|0|0|0|0|1| acegi|76|0|0|63|0|102|20|1|0|0|0|0| aceliv|4|0|0|275|0|59|0|0|0|3|0|0| ace130|31|0|0|93|0|19|0|0|0|0|0|0|
Thanks.

Replies are listed 'Best First'.
Re: Sorting an array of arrays by field
by Hue-Bond (Priest) on Oct 15, 2006 at 02:25 UTC
    I love this site and have learned more about PERL here, than any other website

    Not much, I guess ;^). Please, it's "Perl" (mind the capitalization).

    @look = <INF>;

    It's better to use <DATA> for your code. That way, the data goes with the program and you don't have to provide it separately.

    @out = sort { (split '|', $a, 12)[4] <=> (split '|', $b, 12)[4] } @look;

    You need to quote the pipe: '\|'.

    @fields = @$line;

    $line is not an arrayref. The split you did only extracted the 4th field of the string for sorting, nothing more. @out is a regular array of strings, like @look.

    Applying this comments, and adding strict and warnings, your code becomes:

    my @look = <DATA>; my @out = sort { (split '\|', $a, 12)[4] <=> (split '\|', $b, 12)[4] } + @look; foreach my $line (@out) { print $line; } __DATA__ accred|143|0|0|412|0|0|0|0|0|0|0|0| accu-b|36|0|0|103|0|38|0|0|0|0|2|0| accua|35|0|0|27|0|37|0|0|0|0|1|0| [...]

    If you want to access the fields individually, you have to perform another split inside the foreach loop.

    Further on, doing a Guttman-Rosler transform, it could be:

    ## tested my @out = map { substr $_, 6 } ## hardcoded "6" here... sort map { my $val = (split /\|/)[4]; sprintf "%06d$_",$val } ## ... a +nd here <DATA>;

    Update: Added benchmark. || Included ambrus' solution.

    --
    David Serrano

      Your benchmark is unfair as the key extraction for ambrus solution is not being measured.

      Anyway, using Sort::Key is, as usual, the fastest solution!

      use Sort::Key 'ikeysort'; my @d = <DATA>; my @e; use Benchmark qw/cmpthese/; cmpthese (4e4, { grt => sub { @e = map { substr $_, 6 } sort map { my $val = (split /\|/ +)[4]; sprintf "%06d$_",$val} @d; }, raw => sub { @e = sort { (split '\|', $a, 12)[4] < +=> (split '\|', $b, 12)[4] } @d; }, ambrus => sub { my @key = map { (split /\|/)[4] } @d +; @e = @d[ sort { $key[$a] <=> $key[$b +] } 0 .. @d - 1 ]; }, sk => sub { @e = ikeysort { (split '\|', $_, 12)[ +4] } @d } });
      on my computer says...
      Rate raw grt ambrus sk raw 1887/s -- -63% -69% -74% grt 5141/s 172% -- -15% -29% ambrus 6061/s 221% 18% -- -16% sk 7246/s 284% 41% 20% --
      Thanks for the help, David.
Re: Sorting an array of arrays by field
by merlyn (Sage) on Oct 15, 2006 at 08:43 UTC
Re: Sorting an array of arrays by field
by ambrus (Abbot) on Oct 15, 2006 at 17:20 UTC

    Hue-Bond has already pointed out to the problems with your code, and given a working version.

    He's also given a faster solution with both ST and GRT. Thus, there's little left for me to do. However, I can't resist giving a solution using a third, possibly faster sort variant.
    use warnings; use strict; open my $INF, "<", "look.txt" or die "cannot open"; my @look = <$INF>; close($INF); my @key = map { (split /\|/)[4] } @look; my @out = @look[sort { $key[$a] <=> $key[$b] } 0 .. @look - 1]; for my $line (@out) { my @fields = split /\|/, $line; print join(",", @fields[0 .. 4]), "\n"; } __END__

    I'll also show you a ruby solution as a teaser.

    #!ruby -w look = File.open("look.txt").readlines; # here I'm inclined to write ' +map' instead of 'readlines' out = look.sort_by {|line| line.split("|")[4] }; out.each {|line| fields = line.split "|"; puts fields[0 .. 4].join(","); } __END__

    That's it. You don't have to listen to the community's whining about strictures, or declare variables, nor do you have to learn complicated idioms for efficently sorting by a field, or quote regexp metacharacters if the language can do that for you. :)