in reply to Array Sort

For small amounts of data and with a simple way of extracting the key string the technique BrowerUK suggests is a good solution. If finding the key is expensive or the amount of data being sorted is large such that the sort processing takes a long time a common technique is the schwartzian transform. Consider:

#!/usr/local/bin/perl use strict; use warnings; my @data = ('AC-BA91-CA', 'AB-BA92-CA', 'AD-BA90-CC', 'AA-BA93-CA', 'AA-BA89- +CB'); my @sorted = map {$_->[0]} sort {$a->[2] cmp $b->[2]} map {[$_, split '-']} @da +ta; print "$_\n" for @sorted;

Prints:

AA-BA89-CB AD-BA90-CC AC-BA91-CA AB-BA92-CA AA-BA93-CA

The key to the process is to transform the data into a form that can be sorted, sort it, then transform it back again. The process really runs from right to left with the right most map generating a temporary transformed list (with the original data as the first element of each list item. The sort is in the middle. Then the left most map pulls the original string out of each item in the sorted list.

True laziness is hard work

Replies are listed 'Best First'.
Re^2: Array Sort
by Utilitarian (Vicar) on May 10, 2011 at 07:09 UTC
    Not disagreeing with the strategy, however if there is a large quantity of data, shouldn't you restrict the sort element of the array to contain only the element to be sorted, as far as I can see it adds no overhead (we already have a list from the split function) and reduces the memory usage. ie
    #!/usr/local/bin/perl use strict; use warnings; my @data = ('AC-BA91-CA', 'AB-BA92-CA', 'AD-BA90-CC', 'AA-BA93-CA', 'AA-BA89- +CB'); my @sorted = map {$_->[0]} sort {$a->[1] cmp $b->[1]} map {[ $_ , (split '-')[1 +] ]} @data; print "$_\n" for @sorted;
    Just wondering, Utilitarian.

    print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."

      Sure. Storing just the original string and sort key is a reasonable idea if the data is very large. The little bit of extra "clutter" added was better left out of the original example though, especially as the OP may actually be working with quite different data and have different requirements for extracting the sort key than was suggested in the OP.

      True laziness is hard work
Re^2: Array Sort
by BrowserUk (Patriarch) on May 10, 2011 at 18:18 UTC

    It is worth noting that for the complication of the ST to save the OP any time over the standard sort, he would have to be sorting at least a 1 million elements.

    And if he is sorting that much data, then he'll need every gain he can get, in which case the GRT suggested by javafan is far more effective. And it is simpler than the ST.

    All in all, I wonder if there is ever a situation where the ST makes sense?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      All in all, I wonder if there is ever a situation where the ST makes sense?

      Over plain sort, sure. Over GRT? Not so much.

      ST can return objects, whereas GRT returns strings unless an external array is used.

      ST can produce simpler and more readable code than GRT in some circumstances.

      However, ST isn't as fast as GRT.

      # ST my @fhs = map $_->[0], sort { $a->[1] <=> $b->[1] } map [ $_, fileno($_) ], get_fhs();
      # GRT my @fhs = get_fhs(); @fhs = map $fhs[ unpack('x4 N', $_) ] sort map pack('NN', fileno($fhs[$_]), $_), 0..$#fhs;
      # GRT my @fhs = get_fhs(); @fhs = @fhs[ map unpack('x4 N', $_), sort map pack('NN', fileno($fhs[$_]), $_), 0..$#fhs ];

        My point was that it requires a pretty expensive key extraction for the ST to be beneficial.

        For example, with fileno the ST never gets a look in and it requires a 1000+ files before the GRT starts to make inroads:

        c:\test>junk92 *.txt Files: 46 Rate ST GRT1 GRT2 sort ST 9206/s -- -21% -27% -58% GRT1 11725/s 27% -- -8% -46% GRT2 12681/s 38% 8% -- -42% sort 21855/s 137% 86% 72% -- c:\test>junk92 *.pl Files: 657 Rate ST GRT1 GRT2 sort ST 411/s -- -39% -44% -46% GRT1 675/s 64% -- -8% -11% GRT2 735/s 79% 9% -- -3% sort 757/s 84% 12% 3% -- c:\test>junk92 * Files: 1293 Rate ST GRT1 sort GRT2 ST 186/s -- -41% -45% -46% GRT1 315/s 70% -- -7% -8% sort 338/s 82% 7% -- -1% GRT2 341/s 84% 8% 1% --

        So, by the time you get to the point where the standard sort, even with numeric keys, can be beaten, you might as well move to straight to a GRT and have done with it.

        sort => q[ my @fhs = sort { fileno( $a ) <=> fileno( $b ) } @fhs; ],

        Others per your code.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re^2: Array Sort
by ikegami (Patriarch) on May 10, 2011 at 19:10 UTC
    I think you might find this application of ST will actually slow down the sort. As you said, it helps if the key is expensive to computer.

    You also said ST is more likely to help for long lists, but that's not true. It loses effectiveness as the list grows.