Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Please Help!
I have 4 different sets data which are all associated and I need to sort them according to one of the fields and then by another. Is it possible to do this within one of the data strucutures within perl. I've looked at hashes of hashes but can't sort by the two different fields.
E.g.
Subject Query e_value startend
ID12KBrH2e-262240031622400515
ID34BrH02e-0538768603876936
ID12BrH02e-0538767003873500

So i need to sort or group the 'subject' field and then sort in ascending order by 'start', keeping each record associated.
Many thanks,
Chris

Replies are listed 'Best First'.
Re: Sorting complex records
by Anonymous Monk on Sep 17, 2003 at 03:15 UTC
    @data = map { [split] } <>; @data = sort { $a->[0] cmp $b->[0] || $a->[3] <=> $b->[3] } @data; print "@$_\n" for @data; __END__ ID12 KBrH 2e-26 22400316 22400515 ID34 BrH0 2e-05 3876860 3876936 ID12 BrH0 2e-05 3876700 3873500
Re: Sorting complex records
by Roger (Parson) on Sep 17, 2003 at 04:16 UTC
    Ok, if your data are stored in a file, say, in tab delimited format, you can sort the records without using perl at all:
    sort -t "\t" -k 2 file.txt > file.out
    This will sort the file based on the 2nd field in the file, and output the sorted records to file.out.
    sort -t "\t" -k 3 -n file.txt > file.out
    This will sort the file based on the numerical value of field 3.

    (Untested)
Re: Sorting complex records
by simonm (Vicar) on Sep 17, 2003 at 17:22 UTC
    Generally, you can do this with multi-level sorts. A naive implementation might look like this:
    # load data into a list of hash-refs my @data = ( { Subject => 'ID12', Query => 'KBrH', start => 22400316, ... }, ... ); my @sorted = sort { $a->{Subject} cmp $b->{Subject} or $a->{start} <=> $b->{start} } @data;
    I've written a CPAN module to make this kind of thing trivial: Data::Sorting.

    If you've loaded your data into a list of hash-refs:

    use Data::Sorting 'sort_array'; sort_array( @data, 'Subject', 'start' );

    Alternately, if you've loaded your data into an array of arrays like this:

    my $data = [ [ 'ID12', 'KBrH', '2e-26', 22400316, ... ], ... ]; use Data::Sorting 'sort_arrayref'; sort_arrayref( $data, 0, 3 );

    There are other functions that return a sorted copy of the array, if you don't want to change the order of the original.

    use Data::Sorting 'sorted_array'; my @sorted = sorted_array( @data, 'Subject', 'start' );

    It'll run slower than a well-written inline sort statement, but it does some tricks under the covers to keep the performance up to acceptable levels (automatically picking a Schwartizian Transform or Guttman-Rossler strategy based on the arguments received), and you only need to write one line of code rather than a confusing block expression.

Re: Sorting complex records
by Not_a_Number (Prior) on Sep 17, 2003 at 19:21 UTC

    If, as appears likely, the first field is an ID number (ie always the two letters 'ID' followed by a number), it would be nice to sort these numerically rather than ascibeticallly (ie with ID100 coming after rather than before ID20...).

    Adapting the first answer given above, you could do this:

    @data = sort { substr ($a->[0], 2) <=> substr ($b->[0], 2) || $a->[3] <=> $b->[3] } @data;

    Meanwhile, I've downloaded the Data::Sorting module and have started playing with it, but haven't yet worked out how this would be done (although I'm sure it's simple :-).

    dave

      Data::Sorting allows you to pass in a subroutine reference that is responsible for extracting the value to sort on:
      sort_arrayref( $data, -compare=>'numeric', sub { substr (shift)->[0] +, 2 }, 3 )

      There's also a (poorly-documented) interface to let you specify this in data rather than code:

      sort_arrayref( $data, -compare=>'numeric', -extract=>'compound', [ index => 0, substr => 2 ], [ index => 3 ] )