in reply to Re: Sorting question
in thread Sorting question

or Sort::Key::Natural:
use Sort::Key::Natural qw(natsort); my @sorted = natsort @data;
that for the OP sample data is almost five times faster than Sort::Naturally:
use Benchmark qw(cmpthese); use Sort::Naturally qw(nsort); use Sort::Key::Natural qw(natsort); my @data = grep !/^\s*$/, <DATA>; chomp(@data); cmpthese(-10, { sn => sub { my @s = nsort @data }, skn => sub { my @s = natsort @data } } ); __DATA__ K-2-D-10A K-2-D-10C K-2-D-10D ...
outputs...
$ perl bm.pl Rate sn skn sn 45.0/s -- -79% skn 216/s 381% --

Replies are listed 'Best First'.
Re^3: Sorting question
by jwkrahn (Abbot) on May 15, 2006 at 22:32 UTC
    FYI, since we're running benckmarks:
    #!/usr/bin/perl use warnings; use strict; use Benchmark 'cmpthese'; use Sort::Naturally 'nsort'; use Sort::Key::Natural 'natsort'; my @data = grep /\S/, <DATA>; sub normalize_digits { my ( $key ) = @_; $key =~ s/(\d+)/sprintf '%03d', $1/eg; return $key; } cmpthese -10, { SN_nsort => sub { my @s = nsort @data }, SKN_natsort => sub { my @s = natsort @data }, GRT_pack => sub { my @s = map unpack( 'x3 a*', $_ ), sort map pack( 'n a a*', /(\d+)([A-Z])/, $_ ), @data }, ST_sub => sub { my @s = map { $_->[ 0 ] } sort { $a->[ 1 ] cmp $b->[ 1 ] } map { [ $_, normalize_digits( $_ ) ] } @data }, GRT_sub => sub { my @s = map { local $_ = $_; s/^.*\0//; $_ } sort map { normalize_digits( $_ ) . "\0$_" } @data }, GRT_sprintf => sub { my @s = map { s/-0+/-/g; $_ } sort map { s/(\d+)/sprintf '%03d', $1/eg; $_ } @data }, }; __DATA__ K-2-D-10A K-2-D-10C K-2-D-10D K-2-D-10E K-2-D-10F K-2-D-10G K-2-D-11A etc. etc.

    Which gave me these results:

    Rate SN_nsort SKN_natsort ST_sub GRT_sub GRT_sprint +f GRT_pack SN_nsort 64.6/s -- -81% -90% -91% -93 +% -97% SKN_natsort 341/s 427% -- -47% -51% -62 +% -82% ST_sub 642/s 893% 88% -- -7% -29 +% -67% GRT_sub 694/s 973% 104% 8% -- -23 +% -64% GRT_sprintf 900/s 1292% 164% 40% 30% - +- -53% GRT_pack 1928/s 2883% 466% 200% 178% 114 +% --

    :-)

      well, Sort::Key::Natural is a general solution that doesn't make any assumption about the strings passed. For instance it doesn't have a limit on the number of groups or on the size of the numbers that can be embedded in the strings.

      But if you know the data is of the format /^K-2-D-\d+\w$/ and want to create a sorter taking advantage of it, a Sort::Key based solution is still the fastest:

      use Sort::Key::Multi 'ii_keysort'; ... cmpthese(-10, { ... SKMii => sub { my @s = ii_keysort { /(\d+)([A-Z])/; $1, ord $2 } +@data } } );
      gives me:
      SN_nsort 46.9/s -- -81% -89% -90% -92% + -96% -96% SKN_natsort 244/s 420% -- -43% -47% -58% + -79% -81% ST_sub 427/s 812% 75% -- -7% -27% + -62% -66% GRT_sub 460/s 881% 89% 8% -- -22% + -60% -64% GRT_sprintf 587/s 1152% 141% 37% 28% -- + -48% -54% GRT_pack 1136/s 2323% 366% 166% 147% 94% + -- -11% SKMii 1272/s 2614% 422% 198% 177% 117% + 12% --