Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I am new to Perl, I was wondering if anyone have sample code to sort multiple columns based on positional data file(the data is not delimited by any characters). I need to sort based on Col1, Col2 and Col5 and create the output file. I would appreciate if someone can help me out with this.

TIA, Prabal

===============

This is the input file:

Col1 Col2 Col3 Col4 Col5 Col6 ======================================================== D1001 SNAM1 1 101 XYZ1234 21.11 D1001 SNAM1 1 102 XYZ2234 22.12 D1002 SNAM2 1 201 PQR2234 12.12 D1002 SNAM2 2 202 PQR2234 32.12 D1002 SNAM2 3 203 PQR2234 52.12 D1002 SNAM2 4 204 PQR2234 37.12 D1001 SNAM1 2 103 XYZ1234 22.12 D1003 SNAM3 1 301 ABC1234 22.12 D1002 SNAM2 5 205 PQR2234 37.12 D1001 SNAM1 4 104 XYZ1234 22.12

This is the output file I am expecting:

D1001 SNAM1 1 101 XYZ1234 21.11 D1001 SNAM1 2 103 XYZ1234 22.12 D1001 SNAM1 4 104 XYZ1234 22.12 D1001 SNAM1 1 102 XYZ2234 22.12 D1002 SNAM2 1 201 PQR2234 12.12 D1002 SNAM2 2 202 PQR2234 32.12 D1002 SNAM2 3 203 PQR2234 52.12 D1002 SNAM2 4 204 PQR2234 37.12 D1002 SNAM2 5 205 PQR2234 37.12 D1003 SNAM3 1 301 ABC1234 22.12

Code tags added by GrandFather

Replies are listed 'Best First'.
Re: Sample Sort Code
by runrig (Abbot) on Sep 12, 2006 at 21:20 UTC
    consider just using (unix command line) sort (also freely available for Windows):
    sort -k1,2 -k5 file_in.txt >file_out.txt
Re: Sample Sort Code
by sgifford (Prior) on Sep 12, 2006 at 21:31 UTC
    You'll probably want to split it into an array, possibly with substr or unpack, then use sort to sort it. Something like:
    #!/usr/bin/perl use warnings; use strict; our @rows; while (<>) { chomp; push @rows,[unpack("a10a13a6a8a13a5",$_)]; } @rows = sort { $a->[0] cmp $b->[0] || $a->[1] cmp $b->[1] || $a->[4] cmp $b->[4] } @rows;

    Update: Fixed field sizes, which are clearer now that the post is cleaned up.

Re: Sample Sort Code
by mreece (Friar) on Sep 12, 2006 at 21:45 UTC
    you can do this with a classical schwartzian transformation:
    my @lines = split /\n/, <<EOD; D1001 SNAM1 1 101 XYZ1234 21.11 D1001 SNAM1 1 102 XYZ2234 22.12 D1002 SNAM2 1 201 PQR2234 12.12 D1002 SNAM2 2 202 PQR2234 32.12 D1002 SNAM2 3 203 PQR2234 52.12 D1002 SNAM2 4 204 PQR2234 37.12 D1001 SNAM1 2 103 XYZ1234 22.12 D1003 SNAM3 1 301 ABC1234 22.12 D1002 SNAM2 5 205 PQR2234 37.12 D1001 SNAM1 4 104 XYZ1234 22.12 EOD @lines = map { $_->[0] } # extract lines back out sort { # sort on split columns $a->[1][0] cmp $b->[1][0] # index 0 is 'col1' || $a->[1][1] cmp $b->[1][1] # col2 || $a->[1][4] cmp $b->[1][4] # col5 } map { [ $_ => [ split /\s+/, $_ ] ] } # [ 'line1' => ['col1', +...] ] @lines; print "$_\n" foreach @lines;
    produces:
    D1001     SNAM1        1     101      XYZ1234      21.11
    D1001     SNAM1        2     103      XYZ1234      22.12
    D1001     SNAM1        4     104      XYZ1234      22.12
    D1001     SNAM1        1     102      XYZ2234      22.12
    D1002     SNAM2        1     201      PQR2234      12.12
    D1002     SNAM2        2     202      PQR2234      32.12
    D1002     SNAM2        3     203      PQR2234      52.12
    D1002     SNAM2        4     204      PQR2234      37.12
    D1002     SNAM2        5     205      PQR2234      37.12
    D1003     SNAM3        1     301      ABC1234      22.12
    
      I think that your first map and the subsequent sort in the ST look a little confusing. There is no need for an AoA as a simple anonymous list will suffice with no need for double subscripts in the sort block. Also, you can use split without arguments here as splitting $_ on whitespace is the default.

      use strict; use warnings; my $discard = scalar <DATA> for 1 .. 2; # Use the line below instead of above if # you want to retain the column headers. # # print scalar <DATA> for 1 .. 2; print map {$_->[0]} sort { $a->[1] cmp $b->[1] || $a->[2] cmp $b->[2] || $a->[5] cmp $b->[5] } map {[$_, split]} <DATA>; __END__ Col1 Col2 Col3 Col4 Col5 Col6 ======================================================== D1001 SNAM1 1 101 XYZ1234 21.11 D1001 SNAM1 1 102 XYZ2234 22.12 D1002 SNAM2 1 201 PQR2234 12.12 D1002 SNAM2 2 202 PQR2234 32.12 D1002 SNAM2 3 203 PQR2234 52.12 D1002 SNAM2 4 204 PQR2234 37.12 D1001 SNAM1 2 103 XYZ1234 22.12 D1003 SNAM3 1 301 ABC1234 22.12 D1002 SNAM2 5 205 PQR2234 37.12 D1001 SNAM1 4 104 XYZ1234 22.12

      Cheers,

      JohnGG

        excellent points.
      Hello everyone, Thanks for the reply. I think this option looks best for me, can you please tell me how to pass an input file instead of EOD, thanks
        open my $input, '<', 'myfile.txt' or die "cannot open myfile.txt: $!"; my @lines = <$input>;
        or, from command line args:
        my $filename = $ARGV[0] or die "Usage: $0 <filename>\n"; open my $input, '<', $filename or die "cannot open $filename: $!"; my @lines = <$input>;
Re: Sample Sort Code
by salva (Canon) on Sep 13, 2006 at 07:30 UTC
    Using Sort::Key::Multi module:
    use Sort::Key::Multi qw(s3_keysort); # s3 => three string keys my @data = <DATA>; my @sorted = s3_keysort { (split)[0,1,4] } @data; print for @sorted; __DATA__ D1001 SNAM1 1 101 XYZ1234 21.11 D1001 SNAM1 1 102 XYZ2234 22.12 D1002 SNAM2 1 201 PQR2234 12.12 D1002 SNAM2 2 202 PQR2234 32.12 D1002 SNAM2 3 203 PQR2234 52.12 D1002 SNAM2 4 204 PQR2234 37.12 D1001 SNAM1 2 103 XYZ1234 22.12 D1003 SNAM3 1 301 ABC1234 22.12 D1002 SNAM2 5 205 PQR2234 37.12 D1001 SNAM1 4 104 XYZ1234 22.12
Re: Sample Sort Code
by Outaspace (Scribe) on Sep 13, 2006 at 12:33 UTC
    You could also yust use (with a split before and a join after):
    sort { substr($a,1,5) cmp substr($b,1,5) || substr($a,10,5) cmp substr($b,10,5) || substr($a,22,2) cmp substr($b,22,2) || ... } @lines;

    Andre