in reply to Re^2: Sorting based on any column
in thread Sorting based on any column

Sorry for not being clear in requirement. Consider my OBJD Array data is like shown below:

1 ab 2 3 cd 4 5 6 6 9 rc 4 5 ef 6 3 4 1 7 fa 5 2 tg 5 9 9 0 3 bg 3 9 jh 5 2 2 1

I want to create a subroutine which takes two argument as input, First argument is array and second argument is column number on which sorting has to be done. Say for example sort_array is a subroutine and I pass OBJD array and column to sort as arguments. something like sort_array(\@OBJD,2) and this should provide me below output

1 ab 2 3 cd 4 5 6 6 3 bg 3 9 jh 5 2 2 1 9 rc 4 5 ef 6 3 4 1 7 fa 5 2 tg 5 9 9 0

Or something like sort_array(\@OBJD,6) and this should provide me below output

3 bg 3 9 jh 5 2 2 1 9 rc 4 5 ef 6 3 4 1 1 ab 2 3 cd 4 5 6 6 7 fa 5 2 tg 5 9 9 0

Would like to do it using regular sorting method as well as using 'Schwartzian transform' just to learn it.

Replies are listed 'Best First'.
Re^4: Sorting based on any column
by aaron_baugher (Curate) on May 20, 2015 at 11:25 UTC

    It depends on whether your sample data represents an array-of-arrays, with each non-whitespace token as an element of a sub-array, or a single-level array with each line as an element. In the first case, it's fairly simple, something like this:

    sub sort_aoa { my( $array, $column ) = @_; return sort { $a->[$column] cmp $b->[$column] } @$array; }

    If each element is a whole line, you'll have to split them into words before sorting. This is where a Schwartzian Transform is likely to help the most, but I'll show the basic idea and you can add that:

    sub sort_lines_by_column { my( $array, $column ) = @_; return sort { return( (split ' ', $a)[$column] cmp (split ' ', $b)[$column] ); } @$array; }

    (Untested. In both cases, replace the 'cmp' comparison with whatever you want.)

    Aaron B.
    Available for small or large Perl jobs and *nix system administration; see my home node.

      Wow, thanks a lot Aaaron !!!

      This is not Array of Array, so I used second code what you have provided and it works perfectly fine. But I have few doubts:

      1. In split you have specified ' ' which means it will split on single space, but in reality it splits for any number of space.

      2. How can I do it using 'Schwartzian transform', I am still novice in perl, please do not mind.

      3. How can I have flexibility to pass sorting order Ascending OR Descending to this subroutine. I tried as shown below but its not working.

      sub sort_lines_by_column { my( $array, $column, $order ) = @_; my $ab; my $cd; if ($order eq 'asc') {$ab = "\$a"; $cd = "\$b";} elsif ($order eq 'dsc +') {$ab = "\$b"; $cd = "\$a";}; return sort { return( (split ' ', $ab)[$column] <=> (split ' ', $cd)[$column] ); } @$array; }

      Its giving error "Use of uninitialized value in numeric comparison (<=>)"

        Its giving error "Use of uninitialized value in numeric comparison (<=>)"

        That's because this is almost certainly not doing whatever you think it's doing:

        $ab = "\$a";

        That's taking the value of $a and appending it to a backslash and making it the value of $ab, so "11" becomes "\11". I'm guessing that you're trying to make $ab a reference to $a, but to do that you'd need to leave out the quotes, and that would also change the later code.

        Personally, if I wanted to have a toggle between two different ways to sort, I'd do it like this (unless the sort comparison is very complex, in which case it should be in a separate subroutine anyway):

        sub sort_array_by_column_asc_or_desc { my( $array, $column, $order ) = @_; if( $order eq 'desc' ){ return sort { put_descending_sort_comparison_here } @$array; } else { # default to ascending sort return sort { put_ascending_sort_comparison_here } @$array; } }

        Note: the else there isn't necessary, but I like it because it makes the choice obvious.

        Aaron B.
        Available for small or large Perl jobs and *nix system administration; see my home node.

        UPDATE: Thanks to AnomalousMonk for catching my mistake in forgetting that map needed to return a reference to the new array. Corrected in the code below. Also, he has a very nice way to handle the sorting choice; check that out in his reply.

        1. In split you have specified ' ' which means it will split on single space, but in reality it splits for any number of space.

        Right, that's a special case for split, which splits on any whitespace. That usually works well unless you need something more specific -- say, if your fields are separated by tabs but can include spaces. If you need to split on a specific type or amount of whitespace, adjust the first argument to split accordingly.

        2. How can I do it using 'Schwartzian transform', I am still novice in perl, please do not mind.

        I wouldn't call that a novice-level technique; I probably used Perl for several years before creating a ST myself. You can find plenty of examples and tutorials on it. But basically, it goes something like this:

        # in pseudo-code: for each element calculate the sorting value for that element put the original element and the sorting value into a 2-element list pass these 2-element lists to your sorting routine, which sorts based on the sorting value of each element for each element in the sorted list of 2-element lists pull out the original element # in perl, an example using the typical map/sort/map layout # to sort a list of numbers based on the return value of a # complex subroutine that calculates the number of primes # less than each number's 25th power: my @newarray = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_ => n_primes_less_than_25th_power($_) ] } @oldarray;

        The essence is that each element of @oldarray is passed to the map which does the complex calculation on that element and creates a 2-element array containing the original element and the calculated value. References to those are passed to the sort, which can then sort on the calculated values without needed to recalculate them for every comparison. The sorted references then go to the map which just passes on the original values.

        Your case would look much like this, except that instead of calling my n_primes...() subroutine, you'd have some code (or a call to a subroutine) in that location that parses out the value on which you want to sort. The first element $_->[0] would be the line, and the second element $_->[1] would be the parsed-out value for sorting.

        Aaron B.
        Available for small or large Perl jobs and *nix system administration; see my home node.

        If you're willing (and able) to move your data into a AoA structure (i.e $array[$row][$col] format) and willing to forgo the Schwartzian transform aspect, Data::Table can sort your data on any selected column in ascending or descending order. And "table::sort can take a user supplied operator, this is useful when neither numerical nor alphabetic order is correct."

        Also, if you ever need to do complex sorting (such as sort first by column A in ascending order and then sort by column B in descending order...), Data::Table can handle that too.

Re^4: Sorting based on any column
by Anonymous Monk on May 20, 2015 at 11:34 UTC

    Another piece of the puzzle: sort accepts code refs as the sorting function.

    my @l = (10, 2, 200, 23, 3, 9, 11, 1); my $num = sub { $a <=> $b }; my $str = sub { $a cmp $b }; print join(", ", sort $num @l ), "\n"; print join(", ", sort $str @l ), "\n"; __END__ 1, 2, 3, 9, 10, 11, 23, 200 1, 10, 11, 2, 200, 23, 3, 9
Re^4: Sorting based on any column
by Anonymous Monk on May 20, 2015 at 11:23 UTC

    The thread linked to by the other anon above contains some ideas. Personally I like Scalar::Util's looks_like_number, since AFAIK that's the same function Perl uses internally. If you know that your columns always contain either numbers xor non-numbers, testing the first value of the column should be enough to determine which comparison to use for that column. If however your columns contain a mix of numbers and non-numbers (e.g. "1 a","foo","3.14","42","b5ar","93b"), you'll have to make a decision on how to sort a column like that.