in reply to Numeric Sort for Stringified Value (How to Avoid Warning)

Instead of turning off the warning, you could sort your data using the Schwartzian Transform.

my @new = map { $_->[0] } sort { $b->[1] <=> $a->[1] } map { [ $_, (split( /\s+/, $_, 2 ))[0] ] } @old;

The example assumes all of your data is in the same format as it is given in your post (a number separated from the rest of the string by whitespace).

HTH

  • Comment on Re: Numeric Sort for Stringified Value (How to Avoid Warning) (Use the ST)
  • Download Code

Replies are listed 'Best First'.
Re^2: Numeric Sort for Stringified Value (How to Avoid Warning) (Use the ST)
by pg (Canon) on Sep 16, 2005 at 05:32 UTC

    A bit over-engineered in this case. Take the same assumption as yours, the code could be much simpler:

    my @new = sort {(split /\s+/, $b)[0] <=> (split /\s+/, $a)[0]} @old;

    And it is faster:

    use Data::Dumper; use strict; use warnings; my @old = ("10.5 AA", "10.6 AA", "9 AC", "2 BB"); my $t0 = time(); for (1..200000) {#yours my @new = map { $_->[0] } sort { $b->[1] <=> $a->[1] } map { [ $_, (split( /\s+/, $_, 2 ))[0] ] } @old; } print time() - $t0, "\n"; $t0 = time(); for (1..200000) {#mine my @new = sort {(split /\s+/, $b)[0] <=> (split /\s+/, $a)[0]} @ol +d; } print time() - $t0, "\n";

    I ran four times, yours took: 13, 14, 14, 13 seconds, when mine took 8, 10, 11, 9 seconds.

      The ST is a bit more complicated, but unless we know what the actual data looks like and how much of it there is, I would hesitate to say it is over-engineered. We encourage people to post minimal examples, so I would not be surprised if the OP simplified the input data.

      I am certainly no expert on benchmarking, but my tests yield opposite results. I created arrays containing 5, 10, 20, 40, and 80 elements each and compared our sort routines. I ran the code 5 times and averaged the results. The ST approach was faster in all cases, and the difference increased with the size of the array.

      Array size = 5 Rate pg bobf pg 23058/s -32% bobf 33853/s 47% Array size = 10 Rate pg bobf pg 8606/s -51% bobf 17506/s 103% Array size = 20 Rate pg bobf pg 3099/s -64% bobf 8648/s 179% Array size = 40 Rate pg bobf pg 1207/s -71% bobf 4167/s 245% Array size = 80 Rate pg bobf pg 490/s -75% bobf 1987/s 305%

      Benchmarking code and complete results:

      Unless the OP is dealing with large data sets, the time difference is probably negligible. In that case, I'd recommend whatever approach the OP is most comfortable maintaining.

      TMTOWTDI. :)

      For tiny sets, N log N is so similar to N that Schwartzian is pretty worthless. I changed
      my @old = ("10.5 AA", "10.6 AA", "9 AC", "2 BB");
      to
      my @old = ("10.5 AA", "10.6 AA", "9 AC", "2 BB") x 100;
      (and lowered the number of iterations to 2000) and yours took 1.5 times longer (25s vs 10s). Whether it's over-engineered or not depends on the input set.