Re^4: How can I do a numeric sort on a substring? [Benchmark]

Replies are listed 'Best First'.
Re^5: How can I do a numeric sort on a substring? (context matters) by LanX (Saint) on Jun 27, 2021 at 13:46 UTC
> Perl is propagating the caller's context to the returning statement, maybe you should check if benchmark is using list context too? my suspicion was justified, the benchmarks are in void context, that's why simple sorts are just doing nothing. ( and nothing is fast ;) I took your code and forced all subs to operate in list context, by prepending `@ordered =` in the first line. That's the result with 10000 elements (you can also adjust $max for more or less elements) Perl & OS: v5.32.1 on MSWin32 Unordered data (for preamble tests): a-10 a-01 a-22 a-2 a-0 a-3 a-000 a-1 a-12345 a-1 Preamble tests: grt_pack_expr: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 grt_pack_expr_q: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 st_regex: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 st_regex_anchored: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 st_regex_anch_expr_ni: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 st_regex_anch_ni: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 st_regex_expr_ni: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 st_regex_no_index: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 map_cat_substr: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 map_cat_substr_len: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 sort_pack: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 sort_regex: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 sort_regex_anchored: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 sort_substr: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 Legend: GRTpe: grt_pack_expr GRTpeq: grt_pack_expr_q STr: st_regex STra: st_regex_anchored STraen: st_regex_anch_expr_ni STran: st_regex_anch_ni STren: st_regex_expr_ni STrn: st_regex_no_index mcs: map_cat_substr mcsl: map_cat_substr_len sp: sort_pack sr: sort_regex sra: sort_regex_anchored ss: sort_substr Benchmarks: Note: Unordered data extended with 'map "a-$_", shuffle 0..10000' Rate sra sr sp STra STr STren STraen STran STrn ss + mcs mcsl GRTpeq GRTpe sra 3.34/s -- -2% -73% -81% -82% -82% -82% -82% -82% -85% +-92% -92% -93% -95% sr 3.40/s 2% -- -72% -80% -81% -82% -82% -82% -82% -85% +-92% -92% -92% -95% sp 12.3/s 270% 263% -- -28% -32% -34% -34% -35% -35% -44% +-71% -72% -72% -82% STra 17.2/s 416% 406% 39% -- -5% -8% -8% -9% -10% -22% +-59% -60% -61% -75% STr 18.2/s 445% 435% 47% 6% -- -2% -3% -4% -4% -18% +-57% -58% -59% -73% STren 18.6/s 459% 448% 51% 8% 2% -- -0% -1% -2% -16% +-56% -57% -58% -73% STraen 18.7/s 461% 451% 52% 9% 3% 0% -- -1% -2% -16% +-55% -57% -58% -72% STran 18.9/s 467% 456% 53% 10% 4% 1% 1% -- -1% -15% +-55% -56% -58% -72% STrn 19.0/s 471% 460% 54% 11% 5% 2% 2% 1% -- -14% +-55% -56% -57% -72% ss 22.2/s 565% 552% 80% 29% 22% 19% 18% 17% 16% -- +-47% -49% -50% -67% mcs 42.0/s 1159% 1136% 240% 144% 131% 125% 124% 122% 121% 89% + -- -3% -6% -38% mcsl 43.3/s 1199% 1174% 251% 152% 138% 132% 131% 129% 128% 95% + 3% -- -3% -36% GRTpeq 44.7/s 1239% 1214% 262% 160% 146% 140% 138% 136% 135% 101% + 6% 3% -- -34% GRTpe 67.9/s 1937% 1898% 451% 295% 273% 264% 263% 259% 257% 206% + 62% 57% 52% -- [download] here the code #!/usr/bin/env perl use strict; use warnings; use namespace::autoclean; use Benchmark 'cmpthese'; use List::Util 'shuffle'; my @ordered; my @unordered = qw{a-10 a-01 a-22 a-2 a-0 a-3 a-000 a-1 a-12345 a-1}; my %expanded_abbrev_for = ( sr => 'sort_regex', STr => 'st_regex', STrn => 'st_regex_no_index', STren => 'st_regex_expr_ni', sra => 'sort_regex_anchored', STra => 'st_regex_anchored', STran => 'st_regex_anch_ni', STraen => 'st_regex_anch_expr_ni', ss => 'sort_substr', mcs => 'map_cat_substr', mcsl => 'map_cat_substr_len', sp => 'sort_pack', GRTpe => 'grt_pack_expr', GRTpeq => 'grt_pack_expr_q', ); my %coderef_for = ( sr => \&sort_regex, STr => \&st_regex, STrn => \&st_regex_no_index, STren => \&st_regex_expr_ni, sra => \&sort_regex_anchored, STra => \&st_regex_anchored, STran => \&st_regex_anch_ni, STraen => \&st_regex_anch_expr_ni, ss => \&sort_substr, mcs => \&map_cat_substr, mcsl => \&map_cat_substr_len, sp => \&sort_pack, GRTpe => \&grt_pack_expr, GRTpeq => \&grt_pack_expr_q, ); print "Perl & OS:\n $^V on $^O\n"; print "Unordered data (for preamble tests):\n @unordered\n"; print "Preamble tests:\n"; my $tests_fmt = " %-22s %s\n"; for my $name (sort keys %coderef_for) { printf $tests_fmt, "$expanded_abbrev_for{$name}:", "@{[$coderef_for{$name}->()]}"; } exit if @ARGV && $ARGV[0] eq '--dry_run'; print "Legend:\n"; my $legend_fmt = " %-7s %s\n"; for my $abbrev (sort keys %expanded_abbrev_for) { printf $legend_fmt, "$abbrev:", $expanded_abbrev_for{$abbrev}; } # Extend @unordered for improved benchmarking my $max = 10000; push @unordered, map "a-$_", shuffle 0..$max; print "Benchmarks:\n"; print " Note: Unordered data extended with 'map \"a-\$_\", shuffle +0..$max'\n"; my $count = 0; cmpthese $count => \%coderef_for; sub sort_regex { @ordered = sort { ($a =~ /(\d+)/)[0] <=> ($b =~ /(\d+)/)[0] } @unordered; } sub st_regex { @ordered = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [$_, (/(\d+)/)[0]] } @unordered; } sub st_regex_no_index { @ordered = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [$_, /(\d+)/] } @unordered; } sub st_regex_expr_ni { @ordered = map $_->[0], sort { $a->[1] <=> $b->[1] } map [$_, /(\d+)/], @unordered; } sub sort_regex_anchored { @ordered = sort { ($a =~ /(\d+)$/)[0] <=> ($b =~ /(\d+)$/)[0] } @unordered; } sub st_regex_anchored { @ordered = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [$_, (/(\d+)$/)[0]] } @unordered; } sub st_regex_anch_ni { @ordered = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [$_, /(\d+)$/] } @unordered; } sub st_regex_anch_expr_ni { @ordered = map $_->[0], sort { $a->[1] <=> $b->[1] } map [$_, /(\d+)$/], @unordered; } sub sort_substr { @ordered = sort { substr($a, 2) <=> substr($b, 2) } @unordered; } sub map_cat_substr { @ordered = map { 'a-' . $_ } sort { $a <=> $b } map { substr $_, 2 } @unordered; } sub map_cat_substr_len { @ordered = map { 'a-' . $_ } sort { $a <=> $b } map { substr $_, 2, length($_) - 2 } @unordered; } sub sort_pack { @ordered = sort { pack(L => substr($a, 2)) cmp pack(L => substr($b, 2)) } @unordered; } sub grt_pack_expr { @ordered = map substr($_, 4), sort map pack(L => substr($_, 2)) . $_, @unordered; } sub grt_pack_expr_q { @ordered = map substr($_, 8), sort map pack(Q => substr($_, 2)) . $_, @unordered; } [download] Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply] [d/l] [select]
Re^6: How can I do a numeric sort on a substring? (context matters) by kcott (Archbishop) on Jun 27, 2021 at 23:22 UTC
++ Many thanks for tracking down the problem. Much appreciated. The results are now more in line with what I would have expected. I see that Perl's string handling function, `substr`, outstrips the regex solutions: I have been recommending, for a very long time, that string functions be chosen over regexes (where they provide equivalent functionality). I should probably add some ST routines (e.g. `STss`, `STmcs`) to see how they fare; for instance, would `GRTpe` be faster than `STss`. I'm currently at $work, so I can't do that now; I'll look into it this evening (i.e. ~8-10hrs hence). — Ken	[reply] [d/l] [select]
Re^7: How can I do a numeric sort on a substring? [Benchmark: reworked and extended] by kcott (Archbishop) on Jun 28, 2021 at 08:21 UTC
I wrapped all of the routines in `@{[...]}` to provide the list context; that was what I'd used in the preamble tests. I added an `STss` as I had indicated this morning. I decided that `STmcs` was going to be pretty much the same as `STss`, so I skipped that one. I did add an `mcse` which was `mcs` with `map BLOCK` replaced by `map EXPR`. `sub st_sort_substr { @{[ map $_->[0], sort { $a->[1] <=> $b->[1] } map [$_, substr $_, 2], @unordered ]}; } sub map_cat_substr_expr { @{[ map "a-$_", sort { $a <=> $b } map substr($_, 2), @unordered ]}; }` [download] I saw ++swl's post. There wasn't any code there, so I guessed. `use Sort::Key 'ikeysort'; use Sort::Key::Natural 'natsort'; ... sub sort_key_integer { @{[ ikeysort { substr $_, 2 } @unordered ]}; } sub sort_key_natural { @{[ natsort @unordered ]}; }` [download] I ran the benchmark several times; there were no major differences between runs. Here's a sample output, in the spoiler; it's getting very wide (18 subroutines now) and this post is "Re^7", so probably best viewed via the "download" link. Perl & OS: v5.34.0 on cygwin Unordered data (for preamble tests): a-10 a-01 a-22 a-2 a-0 a-3 a-000 a-1 a-12345 a-1 Preamble tests: grt_pack_expr: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 grt_pack_expr_q: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 sort_key_integer: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 sort_key_natural: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 st_regex: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 st_regex_anchored: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 st_regex_anch_expr_ni: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 st_regex_anch_ni: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 st_regex_expr_ni: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 st_regex_no_index: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 st_sort_substr: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 map_cat_substr: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 map_cat_substr_expr: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 map_cat_substr_len: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 sort_pack: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 sort_regex: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 sort_regex_anchored: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 sort_substr: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 Legend: GRTpe: grt_pack_expr GRTpeq: grt_pack_expr_q SKi: sort_key_integer SKn: sort_key_natural STr: st_regex STra: st_regex_anchored STraen: st_regex_anch_expr_ni STran: st_regex_anch_ni STren: st_regex_expr_ni STrn: st_regex_no_index STss: st_sort_substr mcs: map_cat_substr mcse: map_cat_substr_expr mcsl: map_cat_substr_len sp: sort_pack sr: sort_regex sra: sort_regex_anchored ss: sort_substr Benchmarks: Note: Unordered data extended with 'map "a-$_", shuffle 0..10000' Rate sra sr sp SKn ss STrn STran STr STra STren ST +raen STss GRTpe GRTpeq mcsl mcs mcse SKi sra 12.0/s -- -1% -59% -61% -76% -81% -81% -81% -81% -81% +-81% -84% -89% -89% -90% -90% -90% -94% sr 12.0/s 1% -- -59% -60% -76% -81% -81% -81% -81% -81% +-81% -84% -88% -89% -90% -90% -90% -94% sp 29.4/s 145% 144% -- -3% -42% -53% -53% -53% -53% -53% +-54% -60% -72% -72% -75% -76% -76% -86% SKn 30.4/s 154% 153% 3% -- -40% -51% -52% -52% -52% -52% +-52% -59% -71% -71% -74% -75% -75% -86% ss 50.4/s 321% 319% 71% 66% -- -19% -20% -20% -20% -20% +-20% -32% -52% -52% -57% -58% -58% -77% STrn 62.4/s 421% 419% 112% 105% 24% -- -0% -0% -0% -0% + -1% -16% -40% -41% -47% -48% -48% -71% STran 62.7/s 424% 421% 113% 106% 24% 0% -- -0% -0% -0% + -1% -15% -40% -40% -47% -48% -48% -71% STr 62.8/s 424% 421% 113% 106% 25% 0% 0% -- 0% -0% + -1% -15% -40% -40% -47% -48% -48% -71% STra 62.8/s 424% 421% 113% 106% 25% 0% 0% 0% -- -0% + -1% -15% -40% -40% -47% -48% -48% -71% STren 62.8/s 424% 421% 113% 106% 25% 0% 0% 0% 0% -- + -1% -15% -40% -40% -47% -48% -48% -71% STraen 63.4/s 429% 426% 116% 108% 26% 1% 1% 1% 1% 1% + -- -15% -39% -40% -46% -47% -47% -71% STss 74.2/s 519% 516% 152% 144% 47% 19% 18% 18% 18% 18% + 17% -- -29% -30% -37% -38% -38% -66% GRTpe 105/s 773% 768% 256% 244% 107% 67% 67% 67% 67% 67% + 65% 41% -- -1% -12% -13% -13% -52% GRTpeq 105/s 779% 774% 258% 246% 109% 69% 68% 68% 68% 68% + 66% 42% 1% -- -11% -13% -13% -51% mcsl 118/s 886% 881% 302% 289% 134% 89% 88% 88% 88% 88% + 86% 59% 13% 12% -- -2% -2% -45% mcs 120/s 905% 900% 310% 296% 139% 93% 92% 92% 92% 92% + 90% 62% 15% 14% 2% -- -0% -44% mcse 120/s 906% 901% 310% 296% 139% 93% 92% 92% 92% 92% + 90% 62% 15% 14% 2% 0% -- -44% SKi 217/s 1708% 1699% 637% 612% 330% 247% 245% 245% 245% 245% +242% 192% 107% 106% 83% 80% 80% -- [download] And here's the code: #!/usr/bin/env perl use strict; use warnings; use namespace::autoclean; use Benchmark 'cmpthese'; use List::Util 'shuffle'; use Sort::Key 'ikeysort'; use Sort::Key::Natural 'natsort'; my @unordered = qw{a-10 a-01 a-22 a-2 a-0 a-3 a-000 a-1 a-12345 a-1}; my %expanded_abbrev_for = ( sr => 'sort_regex', STr => 'st_regex', STrn => 'st_regex_no_index', STren => 'st_regex_expr_ni', sra => 'sort_regex_anchored', STra => 'st_regex_anchored', STran => 'st_regex_anch_ni', STraen => 'st_regex_anch_expr_ni', ss => 'sort_substr', mcs => 'map_cat_substr', mcsl => 'map_cat_substr_len', sp => 'sort_pack', GRTpe => 'grt_pack_expr', GRTpeq => 'grt_pack_expr_q', STss => 'st_sort_substr', mcse => 'map_cat_substr_expr', SKi => 'sort_key_integer', SKn => 'sort_key_natural', ); my %coderef_for = ( sr => \&sort_regex, STr => \&st_regex, STrn => \&st_regex_no_index, STren => \&st_regex_expr_ni, sra => \&sort_regex_anchored, STra => \&st_regex_anchored, STran => \&st_regex_anch_ni, STraen => \&st_regex_anch_expr_ni, ss => \&sort_substr, mcs => \&map_cat_substr, mcsl => \&map_cat_substr_len, sp => \&sort_pack, GRTpe => \&grt_pack_expr, GRTpeq => \&grt_pack_expr_q, STss => \&st_sort_substr, mcse => \&map_cat_substr_expr, SKi => \&sort_key_integer, SKn => \&sort_key_natural, ); print "Perl & OS:\n $^V on $^O\n"; print "Unordered data (for preamble tests):\n @unordered\n"; print "Preamble tests:\n"; my $tests_fmt = " %-22s %s\n"; for my $name (sort keys %coderef_for) { printf $tests_fmt, "$expanded_abbrev_for{$name}:", "@{[$coderef_for{$name}->()]}"; } exit if @ARGV && $ARGV[0] eq '--dry_run'; print "Legend:\n"; my $legend_fmt = " %-7s %s\n"; for my $abbrev (sort keys %expanded_abbrev_for) { printf $legend_fmt, "$abbrev:", $expanded_abbrev_for{$abbrev}; } # Extend @unordered for improved benchmarking push @unordered, map "a-$_", shuffle 0..10000; print "Benchmarks:\n"; print " Note: Unordered data extended with 'map \"a-\$_\", shuffle +0..10000'\n"; my $count = 0; cmpthese $count => \%coderef_for; sub sort_regex { @{[ sort { ($a =~ /(\d+)/)[0] <=> ($b =~ /(\d+)/)[0] } @unordered ]}; } sub st_regex { @{[ map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [$_, (/(\d+)/)[0]] } @unordered ]}; } sub st_regex_no_index { @{[ map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [$_, /(\d+)/] } @unordered ]}; } sub st_regex_expr_ni { @{[ map $_->[0], sort { $a->[1] <=> $b->[1] } map [$_, /(\d+)/], @unordered ]}; } sub sort_regex_anchored { @{[ sort { ($a =~ /(\d+)$/)[0] <=> ($b =~ /(\d+)$/)[0] } @unordered ]}; } sub st_regex_anchored { @{[ map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [$_, (/(\d+)$/)[0]] } @unordered ]}; } sub st_regex_anch_ni { @{[ map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [$_, /(\d+)$/] } @unordered ]}; } sub st_regex_anch_expr_ni { @{[ map $_->[0], sort { $a->[1] <=> $b->[1] } map [$_, /(\d+)$/], @unordered ]}; } sub st_sort_substr { @{[ map $_->[0], sort { $a->[1] <=> $b->[1] } map [$_, substr $_, 2], @unordered ]}; } sub sort_substr { @{[ sort { substr($a, 2) <=> substr($b, 2) } @unordered ]}; } sub map_cat_substr { @{[ map { 'a-' . $_ } sort { $a <=> $b } map { substr $_, 2 } @unordered ]}; } sub map_cat_substr_expr { @{[ map "a-$_", sort { $a <=> $b } map substr($_, 2), @unordered ]}; } sub map_cat_substr_len { @{[ map { 'a-' . $_ } sort { $a <=> $b } map { substr $_, 2, length($_) - 2 } @unordered ]}; } sub sort_pack { @{[ sort { pack(L => substr($a, 2)) cmp pack(L => substr($b, 2)) } @unordered ]}; } sub grt_pack_expr { @{[ map substr($_, 4), sort map pack(L => substr($_, 2)) . $_, @unordered ]}; } sub grt_pack_expr_q { @{[ map substr($_, 8), sort map pack(Q => substr($_, 2)) . $_, @unordered ]}; } sub sort_key_integer { @{[ ikeysort { substr $_, 2 } @unordered ]}; } sub sort_key_natural { @{[ natsort @unordered ]}; } [download] — Ken	[reply] [d/l] [select]
Re^8: How can I do a numeric sort on a substring? [Benchmark: reworked and extended] by swl (Prior) on Jun 28, 2021 at 09:49 UTC
Re^6: How can I do a numeric sort on a substring? (context matters) by swl (Prior) on Jun 28, 2021 at 04:00 UTC
Out of curiosity I added some subs using Sort::Key. `sort_key_natural` is the `natsort` function from Sort::Key::Natural while `sort_key_integer` uses the `ikeysort` function from Sort::Key in tandem with `substr`. Perl & OS: v5.28.2 on MSWin32 Unordered data (for preamble tests): a-10 a-01 a-22 a-2 a-0 a-3 a-000 a-1 a-12345 a-1 Preamble tests: grt_pack_expr: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 grt_pack_expr_q: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 sort_key_integer: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 sort_key_natural: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 st_regex: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 st_regex_anchored: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 st_regex_anch_expr_ni: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 st_regex_anch_ni: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 st_regex_expr_ni: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 st_regex_no_index: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 map_cat_substr: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 map_cat_substr_len: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 sort_pack: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 sort_regex: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 sort_regex_anchored: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 sort_substr: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 Legend: GRTpe: grt_pack_expr GRTpeq: grt_pack_expr_q SKi: sort_key_integer SKn: sort_key_natural STr: st_regex STra: st_regex_anchored STraen: st_regex_anch_expr_ni STran: st_regex_anch_ni STren: st_regex_expr_ni STrn: st_regex_no_index mcs: map_cat_substr mcsl: map_cat_substr_len sp: sort_pack sr: sort_regex sra: sort_regex_anchored ss: sort_substr Benchmarks: Note: Unordered data extended with 'map "a-$_", shuffle 0..10000' Rate sra sr SKn sp STraen STr STrn STra STran STren + ss GRTpe GRTpeq mcsl mcs SKi sra 4.59/s -- -0% -59% -68% -80% -81% -82% -82% -82% -83% +-84% -91% -91% -92% -93% -95% sr 4.59/s 0% -- -59% -68% -80% -81% -82% -82% -82% -83% +-84% -91% -91% -92% -93% -95% SKn 11.3/s 147% 147% -- -22% -52% -53% -55% -55% -56% -57% +-61% -77% -78% -81% -82% -87% sp 14.5/s 217% 217% 28% -- -38% -40% -42% -43% -43% -45% +-49% -71% -71% -76% -77% -84% STraen 23.5/s 411% 411% 107% 61% -- -3% -6% -7% -8% -12% +-18% -53% -54% -61% -63% -74% STr 24.3/s 429% 429% 115% 67% 4% -- -3% -4% -5% -8% +-15% -52% -52% -59% -62% -73% STrn 25.0/s 445% 444% 121% 72% 7% 3% -- -1% -2% -6% +-13% -50% -51% -58% -61% -72% STra 25.3/s 452% 451% 124% 74% 8% 4% 1% -- -1% -5% +-12% -50% -50% -58% -60% -72% STran 25.5/s 455% 455% 125% 75% 9% 5% 2% 1% -- -4% +-11% -49% -50% -57% -60% -72% STren 26.6/s 478% 478% 134% 83% 13% 9% 6% 5% 4% -- + -8% -47% -47% -56% -59% -70% ss 28.7/s 525% 525% 153% 97% 22% 18% 15% 13% 13% 8% + -- -43% -43% -52% -55% -68% GRTpe 50.2/s 993% 992% 343% 245% 114% 106% 101% 98% 97% 89% + 75% -- -1% -16% -22% -44% GRTpeq 50.5/s 1000% 1000% 346% 248% 115% 108% 102% 100% 98% 90% + 76% 1% -- -15% -21% -44% mcsl 59.8/s 1202% 1201% 428% 311% 155% 146% 139% 136% 135% 125% +108% 19% 18% -- -7% -33% mcs 64.0/s 1293% 1293% 465% 340% 173% 163% 156% 153% 151% 141% +123% 28% 27% 7% -- -29% SKi 89.5/s 1849% 1849% 690% 516% 281% 268% 258% 253% 251% 237% +212% 78% 77% 50% 40% -- [download] The `natsort` approach is not particularly fast, but this is perhaps to be expected given it is a general purpose function (as are the unanchored regex approaches). I guess the integer key approach is faster as it takes advantage of direct string operations when building the keys, and then whatever optimisations Sort::Key uses internally. I assume the differences in the order of the other approaches compared with Lanx's is due to the code being run on Strawberry perl 5.28. It would be interesting to know how the Sort::Key approaches go under a more recent Perl. Edit: And now I look at the source code for Sort::Key::Natural, it is uses a regex approach to divide the string and pad out the numeric sections, so it is not surprising that it is slower than the other regex based approaches here. https://metacpan.org/dist/Sort-Key/source/lib/Sort/Key/Natural.pm#L34.	[reply] [d/l] [select]
Re^7: How can I do a numeric sort on a substring? (context matters) by salva (Canon) on Jun 28, 2021 at 09:15 UTC
One of the reasons Sort::Key::Natural is relatively slow is because it tries to be correct! For instance, it can handle arbitrarily large numbers or Unicode.	[reply]
Re^8: How can I do a numeric sort on a substring? (context matters) by swl (Prior) on Jun 28, 2021 at 09:45 UTC
Re^7: How can I do a numeric sort on a substring? (context matters) by kcott (Archbishop) on Jun 28, 2021 at 08:36 UTC
G'day swl, See "Re^7: How can I do a numeric sort on a substring? [Benchmark: reworked and extended]". I've added `sort_key_integer` and `sort_key_natural` (made some guesses about the code) as well as a couple of additions of my own. "It would be interesting to know how the Sort::Key approaches go under a more recent Perl." I'm not seeing a huge difference between your output and my latest. `SKn` is at the slow end of the spectrum; `SKi` is by far the fastest (substantially faster on my system with Perl 5.34.0). — Ken	[reply] [d/l] [select]

> Perl is propagating the caller's context to the returning statement, maybe you should check if benchmark is using list context too?

my suspicion was justified, the benchmarks are in void context, that's why simple sorts are just doing nothing. ( and nothing is fast ;)

I took your code and forced all subs to operate in list context, by prepending @ordered = in the first line.

That's the result with 10000 elements (you can also adjust $max for more or less elements)

Perl & OS:
    v5.32.1 on MSWin32
Unordered data (for preamble tests):
    a-10 a-01 a-22 a-2 a-0 a-3 a-000 a-1 a-12345 a-1
Preamble tests:
    grt_pack_expr:         a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    grt_pack_expr_q:       a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    st_regex:              a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    st_regex_anchored:     a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    st_regex_anch_expr_ni: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    st_regex_anch_ni:      a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    st_regex_expr_ni:      a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    st_regex_no_index:     a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    map_cat_substr:        a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    map_cat_substr_len:    a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    sort_pack:             a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    sort_regex:            a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    sort_regex_anchored:   a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    sort_substr:           a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
Legend:
    GRTpe:  grt_pack_expr
    GRTpeq: grt_pack_expr_q
    STr:    st_regex
    STra:   st_regex_anchored
    STraen: st_regex_anch_expr_ni
    STran:  st_regex_anch_ni
    STren:  st_regex_expr_ni
    STrn:   st_regex_no_index
    mcs:    map_cat_substr
    mcsl:   map_cat_substr_len
    sp:     sort_pack
    sr:     sort_regex
    sra:    sort_regex_anchored
    ss:     sort_substr
Benchmarks:
    Note: Unordered data extended with 'map "a-$_", shuffle 0..10000'
         Rate   sra    sr   sp STra  STr STren STraen STran STrn   ss 
+ mcs mcsl GRTpeq GRTpe
sra    3.34/s    --   -2% -73% -81% -82%  -82%   -82%  -82% -82% -85% 
+-92% -92%   -93%  -95%
sr     3.40/s    2%    -- -72% -80% -81%  -82%   -82%  -82% -82% -85% 
+-92% -92%   -92%  -95%
sp     12.3/s  270%  263%   -- -28% -32%  -34%   -34%  -35% -35% -44% 
+-71% -72%   -72%  -82%
STra   17.2/s  416%  406%  39%   --  -5%   -8%    -8%   -9% -10% -22% 
+-59% -60%   -61%  -75%
STr    18.2/s  445%  435%  47%   6%   --   -2%    -3%   -4%  -4% -18% 
+-57% -58%   -59%  -73%
STren  18.6/s  459%  448%  51%   8%   2%    --    -0%   -1%  -2% -16% 
+-56% -57%   -58%  -73%
STraen 18.7/s  461%  451%  52%   9%   3%    0%     --   -1%  -2% -16% 
+-55% -57%   -58%  -72%
STran  18.9/s  467%  456%  53%  10%   4%    1%     1%    --  -1% -15% 
+-55% -56%   -58%  -72%
STrn   19.0/s  471%  460%  54%  11%   5%    2%     2%    1%   -- -14% 
+-55% -56%   -57%  -72%
ss     22.2/s  565%  552%  80%  29%  22%   19%    18%   17%  16%   -- 
+-47% -49%   -50%  -67%
mcs    42.0/s 1159% 1136% 240% 144% 131%  125%   124%  122% 121%  89% 
+  --  -3%    -6%  -38%
mcsl   43.3/s 1199% 1174% 251% 152% 138%  132%   131%  129% 128%  95% 
+  3%   --    -3%  -36%
GRTpeq 44.7/s 1239% 1214% 262% 160% 146%  140%   138%  136% 135% 101% 
+  6%   3%     --  -34%
GRTpe  67.9/s 1937% 1898% 451% 295% 273%  264%   263%  259% 257% 206% 
+ 62%  57%    52%    --
[download]

here the code

#!/usr/bin/env perl

use strict;
use warnings;
use namespace::autoclean;

use Benchmark 'cmpthese';
use List::Util 'shuffle';

my @ordered;
my @unordered = qw{a-10 a-01 a-22 a-2 a-0 a-3 a-000 a-1 a-12345 a-1};
my %expanded_abbrev_for = (
    sr => 'sort_regex',
    STr => 'st_regex',
    STrn => 'st_regex_no_index',
    STren => 'st_regex_expr_ni',
    sra => 'sort_regex_anchored',
    STra => 'st_regex_anchored',
    STran => 'st_regex_anch_ni',
    STraen => 'st_regex_anch_expr_ni',
    ss => 'sort_substr',
    mcs => 'map_cat_substr',
    mcsl => 'map_cat_substr_len',
    sp => 'sort_pack',
    GRTpe => 'grt_pack_expr',
    GRTpeq => 'grt_pack_expr_q',
);
my %coderef_for = (
    sr => \&sort_regex,
    STr => \&st_regex,
    STrn => \&st_regex_no_index,
    STren => \&st_regex_expr_ni,
    sra => \&sort_regex_anchored,
    STra => \&st_regex_anchored,
    STran => \&st_regex_anch_ni,
    STraen => \&st_regex_anch_expr_ni,
    ss => \&sort_substr,
    mcs => \&map_cat_substr,
    mcsl => \&map_cat_substr_len,
    sp => \&sort_pack,
    GRTpe => \&grt_pack_expr,
    GRTpeq => \&grt_pack_expr_q,
);

print "Perl & OS:\n    $^V on $^O\n";
print "Unordered data (for preamble tests):\n    @unordered\n";
print "Preamble tests:\n";
my $tests_fmt = "    %-22s %s\n";

for my $name (sort keys %coderef_for) {
    printf $tests_fmt, "$expanded_abbrev_for{$name}:",
            "@{[$coderef_for{$name}->()]}";
}

exit if @ARGV && $ARGV[0] eq '--dry_run';

print "Legend:\n";
my $legend_fmt = "    %-7s %s\n";

for my $abbrev (sort keys %expanded_abbrev_for) {
    printf $legend_fmt, "$abbrev:",
            $expanded_abbrev_for{$abbrev};
}

# Extend @unordered for improved benchmarking
my $max = 10000;
push @unordered, map "a-$_", shuffle 0..$max;

print "Benchmarks:\n";
print "    Note: Unordered data extended with 'map \"a-\$_\", shuffle 
+0..$max'\n";
my $count = 0;
cmpthese $count => \%coderef_for;

sub sort_regex {
    @ordered =
    sort {
        ($a =~ /(\d+)/)[0] <=> ($b =~ /(\d+)/)[0]
    } @unordered;
}

sub st_regex {
    @ordered =
    map {
        $_->[0]
    }
    sort {
        $a->[1] <=> $b->[1]
    }
    map {
        [$_, (/(\d+)/)[0]]
    } @unordered;
}

sub st_regex_no_index {
    @ordered =
    map {
        $_->[0]
    }
    sort {
        $a->[1] <=> $b->[1]
    }
    map {
        [$_, /(\d+)/]
    } @unordered;
}

sub st_regex_expr_ni {
    @ordered =
    map $_->[0],
    sort {
        $a->[1] <=> $b->[1]
    }
    map [$_, /(\d+)/], @unordered;
}

sub sort_regex_anchored {
    @ordered =
    sort {
        ($a =~ /(\d+)$/)[0] <=> ($b =~ /(\d+)$/)[0]
    } @unordered;
}

sub st_regex_anchored {
    @ordered =
    map {
        $_->[0]
    }
    sort {
        $a->[1] <=> $b->[1]
    }
    map {
        [$_, (/(\d+)$/)[0]]
    } @unordered;
}

sub st_regex_anch_ni {
    @ordered =
    map {
        $_->[0]
    }
    sort {
        $a->[1] <=> $b->[1]
    }
    map {
        [$_, /(\d+)$/]
    } @unordered;
}

sub st_regex_anch_expr_ni {
    @ordered =
    map $_->[0],
    sort {
        $a->[1] <=> $b->[1]
    }
    map [$_, /(\d+)$/], @unordered;
}

sub sort_substr {
    @ordered =
    sort {
        substr($a, 2) <=> substr($b, 2)
    } @unordered;
}

sub map_cat_substr {
    @ordered =
    map {
        'a-' . $_
    }
    sort {
        $a <=> $b
    }
    map {
        substr $_, 2
    } @unordered;
}

sub map_cat_substr_len {
    @ordered =
    map {
        'a-' . $_
    }
    sort {
        $a <=> $b
    }
    map {
        substr $_, 2, length($_) - 2
    } @unordered;
}

sub sort_pack {
    @ordered =
    sort {
        pack(L => substr($a, 2)) cmp pack(L => substr($b, 2))
    } @unordered;
}

sub grt_pack_expr {
    @ordered =
    map substr($_, 4),
    sort
    map pack(L => substr($_, 2)) . $_, @unordered;
}

sub grt_pack_expr_q {
    @ordered =
    map substr($_, 8),
    sort
    map pack(Q => substr($_, 2)) . $_, @unordered;
}
[download]

Cheers Rolf
_{(addicted to the Perl Programming Language :)

Wikisyntax for the Monastery}

[reply]
[d/l]
[select]

++ Many thanks for tracking down the problem. Much appreciated.

The results are now more in line with what I would have expected. I see that Perl's string handling function, substr, outstrips the regex solutions: I have been recommending, for a very long time, that string functions be chosen over regexes (where they provide equivalent functionality).

I should probably add some ST routines (e.g. STss, STmcs) to see how they fare; for instance, would GRTpe be faster than STss. I'm currently at $work, so I can't do that now; I'll look into it this evening (i.e. ~8-10hrs hence).

— Ken

[reply]
[d/l]
[select]

I wrapped all of the routines in @{[...]} to provide the list context; that was what I'd used in the preamble tests.

I added an STss as I had indicated this morning. I decided that STmcs was going to be pretty much the same as STss, so I skipped that one. I did add an mcse which was mcs with map BLOCK replaced by map EXPR.

sub st_sort_substr {
    @{[
        map $_->[0],
        sort {
            $a->[1] <=> $b->[1]
        }
        map [$_, substr $_, 2], @unordered
    ]};
}

sub map_cat_substr_expr {
    @{[
        map "a-$_",
        sort {
            $a <=> $b
        }
        map substr($_, 2), @unordered
    ]};
}
[download]

I saw ++swl's post. There wasn't any code there, so I guessed.

use Sort::Key 'ikeysort';
use Sort::Key::Natural 'natsort';
...
sub sort_key_integer {
    @{[
        ikeysort { substr $_, 2 } @unordered
    ]};
}

sub sort_key_natural {
    @{[
        natsort @unordered
    ]};
}
[download]

I ran the benchmark several times; there were no major differences between runs. Here's a sample output, in the spoiler; it's getting very wide (18 subroutines now) and this post is "Re^7", so probably best viewed via the "download" link.

Perl & OS:
    v5.34.0 on cygwin
Unordered data (for preamble tests):
    a-10 a-01 a-22 a-2 a-0 a-3 a-000 a-1 a-12345 a-1
Preamble tests:
    grt_pack_expr:         a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    grt_pack_expr_q:       a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    sort_key_integer:      a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    sort_key_natural:      a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    st_regex:              a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    st_regex_anchored:     a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    st_regex_anch_expr_ni: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    st_regex_anch_ni:      a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    st_regex_expr_ni:      a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    st_regex_no_index:     a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    st_sort_substr:        a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    map_cat_substr:        a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    map_cat_substr_expr:   a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    map_cat_substr_len:    a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    sort_pack:             a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    sort_regex:            a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    sort_regex_anchored:   a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    sort_substr:           a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
Legend:
    GRTpe:  grt_pack_expr
    GRTpeq: grt_pack_expr_q
    SKi:    sort_key_integer
    SKn:    sort_key_natural
    STr:    st_regex
    STra:   st_regex_anchored
    STraen: st_regex_anch_expr_ni
    STran:  st_regex_anch_ni
    STren:  st_regex_expr_ni
    STrn:   st_regex_no_index
    STss:   st_sort_substr
    mcs:    map_cat_substr
    mcse:   map_cat_substr_expr
    mcsl:   map_cat_substr_len
    sp:     sort_pack
    sr:     sort_regex
    sra:    sort_regex_anchored
    ss:     sort_substr
Benchmarks:
    Note: Unordered data extended with 'map "a-$_", shuffle 0..10000'
         Rate   sra    sr   sp  SKn   ss STrn STran  STr STra STren ST
+raen STss GRTpe GRTpeq mcsl  mcs mcse  SKi
sra    12.0/s    --   -1% -59% -61% -76% -81%  -81% -81% -81%  -81%   
+-81% -84%  -89%   -89% -90% -90% -90% -94%
sr     12.0/s    1%    -- -59% -60% -76% -81%  -81% -81% -81%  -81%   
+-81% -84%  -88%   -89% -90% -90% -90% -94%
sp     29.4/s  145%  144%   --  -3% -42% -53%  -53% -53% -53%  -53%   
+-54% -60%  -72%   -72% -75% -76% -76% -86%
SKn    30.4/s  154%  153%   3%   -- -40% -51%  -52% -52% -52%  -52%   
+-52% -59%  -71%   -71% -74% -75% -75% -86%
ss     50.4/s  321%  319%  71%  66%   -- -19%  -20% -20% -20%  -20%   
+-20% -32%  -52%   -52% -57% -58% -58% -77%
STrn   62.4/s  421%  419% 112% 105%  24%   --   -0%  -0%  -0%   -0%   
+ -1% -16%  -40%   -41% -47% -48% -48% -71%
STran  62.7/s  424%  421% 113% 106%  24%   0%    --  -0%  -0%   -0%   
+ -1% -15%  -40%   -40% -47% -48% -48% -71%
STr    62.8/s  424%  421% 113% 106%  25%   0%    0%   --   0%   -0%   
+ -1% -15%  -40%   -40% -47% -48% -48% -71%
STra   62.8/s  424%  421% 113% 106%  25%   0%    0%   0%   --   -0%   
+ -1% -15%  -40%   -40% -47% -48% -48% -71%
STren  62.8/s  424%  421% 113% 106%  25%   0%    0%   0%   0%    --   
+ -1% -15%  -40%   -40% -47% -48% -48% -71%
STraen 63.4/s  429%  426% 116% 108%  26%   1%    1%   1%   1%    1%   
+  -- -15%  -39%   -40% -46% -47% -47% -71%
STss   74.2/s  519%  516% 152% 144%  47%  19%   18%  18%  18%   18%   
+ 17%   --  -29%   -30% -37% -38% -38% -66%
GRTpe   105/s  773%  768% 256% 244% 107%  67%   67%  67%  67%   67%   
+ 65%  41%    --    -1% -12% -13% -13% -52%
GRTpeq  105/s  779%  774% 258% 246% 109%  69%   68%  68%  68%   68%   
+ 66%  42%    1%     -- -11% -13% -13% -51%
mcsl    118/s  886%  881% 302% 289% 134%  89%   88%  88%  88%   88%   
+ 86%  59%   13%    12%   --  -2%  -2% -45%
mcs     120/s  905%  900% 310% 296% 139%  93%   92%  92%  92%   92%   
+ 90%  62%   15%    14%   2%   --  -0% -44%
mcse    120/s  906%  901% 310% 296% 139%  93%   92%  92%  92%   92%   
+ 90%  62%   15%    14%   2%   0%   -- -44%
SKi     217/s 1708% 1699% 637% 612% 330% 247%  245% 245% 245%  245%   
+242% 192%  107%   106%  83%  80%  80%   --
[download]

And here's the code:

#!/usr/bin/env perl

use strict;
use warnings;
use namespace::autoclean;

use Benchmark 'cmpthese';
use List::Util 'shuffle';
use Sort::Key 'ikeysort';
use Sort::Key::Natural 'natsort';

my @unordered = qw{a-10 a-01 a-22 a-2 a-0 a-3 a-000 a-1 a-12345 a-1};
my %expanded_abbrev_for = (
    sr => 'sort_regex',
    STr => 'st_regex',
    STrn => 'st_regex_no_index',
    STren => 'st_regex_expr_ni',
    sra => 'sort_regex_anchored',
    STra => 'st_regex_anchored',
    STran => 'st_regex_anch_ni',
    STraen => 'st_regex_anch_expr_ni',
    ss => 'sort_substr',
    mcs => 'map_cat_substr',
    mcsl => 'map_cat_substr_len',
    sp => 'sort_pack',
    GRTpe => 'grt_pack_expr',
    GRTpeq => 'grt_pack_expr_q',
    STss => 'st_sort_substr',
    mcse => 'map_cat_substr_expr',
    SKi => 'sort_key_integer',
    SKn => 'sort_key_natural',
);
my %coderef_for = (
    sr => \&sort_regex,
    STr => \&st_regex,
    STrn => \&st_regex_no_index,
    STren => \&st_regex_expr_ni,
    sra => \&sort_regex_anchored,
    STra => \&st_regex_anchored,
    STran => \&st_regex_anch_ni,
    STraen => \&st_regex_anch_expr_ni,
    ss => \&sort_substr,
    mcs => \&map_cat_substr,
    mcsl => \&map_cat_substr_len,
    sp => \&sort_pack,
    GRTpe => \&grt_pack_expr,
    GRTpeq => \&grt_pack_expr_q,
    STss => \&st_sort_substr,
    mcse => \&map_cat_substr_expr,
    SKi => \&sort_key_integer,
    SKn => \&sort_key_natural,
);

print "Perl & OS:\n    $^V on $^O\n";
print "Unordered data (for preamble tests):\n    @unordered\n";
print "Preamble tests:\n";
my $tests_fmt = "    %-22s %s\n";

for my $name (sort keys %coderef_for) {
    printf $tests_fmt, "$expanded_abbrev_for{$name}:",
            "@{[$coderef_for{$name}->()]}";
}

exit if @ARGV && $ARGV[0] eq '--dry_run';

print "Legend:\n";
my $legend_fmt = "    %-7s %s\n";

for my $abbrev (sort keys %expanded_abbrev_for) {
    printf $legend_fmt, "$abbrev:",
            $expanded_abbrev_for{$abbrev};
}

# Extend @unordered for improved benchmarking
push @unordered, map "a-$_", shuffle 0..10000;

print "Benchmarks:\n";
print "    Note: Unordered data extended with 'map \"a-\$_\", shuffle 
+0..10000'\n";
my $count = 0;
cmpthese $count => \%coderef_for;

sub sort_regex {
    @{[
        sort {
            ($a =~ /(\d+)/)[0] <=> ($b =~ /(\d+)/)[0]
        } @unordered
    ]};
}

sub st_regex {
    @{[
        map {
            $_->[0]
        }
        sort {
            $a->[1] <=> $b->[1]
        }
        map {
            [$_, (/(\d+)/)[0]]
        } @unordered
    ]};
}

sub st_regex_no_index {
    @{[
        map {
            $_->[0]
        }
        sort {
            $a->[1] <=> $b->[1]
        }
        map {
            [$_, /(\d+)/]
        } @unordered
    ]};
}

sub st_regex_expr_ni {
    @{[
        map $_->[0],
        sort {
            $a->[1] <=> $b->[1]
        }
        map [$_, /(\d+)/], @unordered
    ]};
}

sub sort_regex_anchored {
    @{[
        sort {
            ($a =~ /(\d+)$/)[0] <=> ($b =~ /(\d+)$/)[0]
        } @unordered
    ]};
}

sub st_regex_anchored {
    @{[
        map {
            $_->[0]
        }
        sort {
            $a->[1] <=> $b->[1]
        }
        map {
            [$_, (/(\d+)$/)[0]]
        } @unordered
    ]};
}

sub st_regex_anch_ni {
    @{[
        map {
            $_->[0]
        }
        sort {
            $a->[1] <=> $b->[1]
        }
        map {
            [$_, /(\d+)$/]
        } @unordered
    ]};
}

sub st_regex_anch_expr_ni {
    @{[
        map $_->[0],
        sort {
            $a->[1] <=> $b->[1]
        }
        map [$_, /(\d+)$/], @unordered
    ]};
}

sub st_sort_substr {
    @{[
        map $_->[0],
        sort {
            $a->[1] <=> $b->[1]
        }
        map [$_, substr $_, 2], @unordered
    ]};
}

sub sort_substr {
    @{[
        sort {
            substr($a, 2) <=> substr($b, 2)
        } @unordered
    ]};
}

sub map_cat_substr {
    @{[
        map {
            'a-' . $_
        }
        sort {
            $a <=> $b
        }
        map {
            substr $_, 2
        } @unordered
    ]};
}

sub map_cat_substr_expr {
    @{[
        map "a-$_",
        sort {
            $a <=> $b
        }
        map substr($_, 2), @unordered
    ]};
}

sub map_cat_substr_len {
    @{[
        map {
            'a-' . $_
        }
        sort {
            $a <=> $b
        }
        map {
            substr $_, 2, length($_) - 2
        } @unordered
    ]};
}

sub sort_pack {
    @{[
        sort {
            pack(L => substr($a, 2)) cmp pack(L => substr($b, 2))
        } @unordered
    ]};
}

sub grt_pack_expr {
    @{[
        map substr($_, 4),
        sort
        map pack(L => substr($_, 2)) . $_, @unordered
    ]};
}

sub grt_pack_expr_q {
    @{[
        map substr($_, 8),
        sort
        map pack(Q => substr($_, 2)) . $_, @unordered
    ]};
}

sub sort_key_integer {
    @{[
        ikeysort { substr $_, 2 } @unordered
    ]};
}

sub sort_key_natural {
    @{[
        natsort @unordered
    ]};
}
[download]

— Ken

[reply]
[d/l]
[select]

Out of curiosity I added some subs using Sort::Key. sort_key_natural is the natsort function from Sort::Key::Natural while sort_key_integer uses the ikeysort function from Sort::Key in tandem with substr.

Perl & OS:
    v5.28.2 on MSWin32
Unordered data (for preamble tests):
    a-10 a-01 a-22 a-2 a-0 a-3 a-000 a-1 a-12345 a-1
Preamble tests:
    grt_pack_expr:         a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    grt_pack_expr_q:       a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    sort_key_integer:      a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    sort_key_natural:      a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    st_regex:              a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    st_regex_anchored:     a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    st_regex_anch_expr_ni: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    st_regex_anch_ni:      a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    st_regex_expr_ni:      a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    st_regex_no_index:     a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    map_cat_substr:        a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    map_cat_substr_len:    a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    sort_pack:             a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    sort_regex:            a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    sort_regex_anchored:   a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
    sort_substr:           a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a-
+12345
Legend:
    GRTpe:  grt_pack_expr
    GRTpeq: grt_pack_expr_q
    SKi:    sort_key_integer
    SKn:    sort_key_natural
    STr:    st_regex
    STra:   st_regex_anchored
    STraen: st_regex_anch_expr_ni
    STran:  st_regex_anch_ni
    STren:  st_regex_expr_ni
    STrn:   st_regex_no_index
    mcs:    map_cat_substr
    mcsl:   map_cat_substr_len
    sp:     sort_pack
    sr:     sort_regex
    sra:    sort_regex_anchored
    ss:     sort_substr
Benchmarks:
    Note: Unordered data extended with 'map "a-$_", shuffle 0..10000'
         Rate   sra    sr  SKn   sp STraen  STr STrn STra STran STren 
+  ss GRTpe GRTpeq mcsl  mcs  SKi
sra    4.59/s    --   -0% -59% -68%   -80% -81% -82% -82%  -82%  -83% 
+-84%  -91%   -91% -92% -93% -95%
sr     4.59/s    0%    -- -59% -68%   -80% -81% -82% -82%  -82%  -83% 
+-84%  -91%   -91% -92% -93% -95%
SKn    11.3/s  147%  147%   -- -22%   -52% -53% -55% -55%  -56%  -57% 
+-61%  -77%   -78% -81% -82% -87%
sp     14.5/s  217%  217%  28%   --   -38% -40% -42% -43%  -43%  -45% 
+-49%  -71%   -71% -76% -77% -84%
STraen 23.5/s  411%  411% 107%  61%     --  -3%  -6%  -7%   -8%  -12% 
+-18%  -53%   -54% -61% -63% -74%
STr    24.3/s  429%  429% 115%  67%     4%   --  -3%  -4%   -5%   -8% 
+-15%  -52%   -52% -59% -62% -73%
STrn   25.0/s  445%  444% 121%  72%     7%   3%   --  -1%   -2%   -6% 
+-13%  -50%   -51% -58% -61% -72%
STra   25.3/s  452%  451% 124%  74%     8%   4%   1%   --   -1%   -5% 
+-12%  -50%   -50% -58% -60% -72%
STran  25.5/s  455%  455% 125%  75%     9%   5%   2%   1%    --   -4% 
+-11%  -49%   -50% -57% -60% -72%
STren  26.6/s  478%  478% 134%  83%    13%   9%   6%   5%    4%    -- 
+ -8%  -47%   -47% -56% -59% -70%
ss     28.7/s  525%  525% 153%  97%    22%  18%  15%  13%   13%    8% 
+  --  -43%   -43% -52% -55% -68%
GRTpe  50.2/s  993%  992% 343% 245%   114% 106% 101%  98%   97%   89% 
+ 75%    --    -1% -16% -22% -44%
GRTpeq 50.5/s 1000% 1000% 346% 248%   115% 108% 102% 100%   98%   90% 
+ 76%    1%     -- -15% -21% -44%
mcsl   59.8/s 1202% 1201% 428% 311%   155% 146% 139% 136%  135%  125% 
+108%   19%    18%   --  -7% -33%
mcs    64.0/s 1293% 1293% 465% 340%   173% 163% 156% 153%  151%  141% 
+123%   28%    27%   7%   -- -29%
SKi    89.5/s 1849% 1849% 690% 516%   281% 268% 258% 253%  251%  237% 
+212%   78%    77%  50%  40%   --
[download]

The natsort approach is not particularly fast, but this is perhaps to be expected given it is a general purpose function (as are the unanchored regex approaches). I guess the integer key approach is faster as it takes advantage of direct string operations when building the keys, and then whatever optimisations Sort::Key uses internally.

I assume the differences in the order of the other approaches compared with Lanx's is due to the code being run on Strawberry perl 5.28. It would be interesting to know how the Sort::Key approaches go under a more recent Perl.

Edit: And now I look at the source code for Sort::Key::Natural, it is uses a regex approach to divide the string and pad out the numeric sections, so it is not surprising that it is slower than the other regex based approaches here. https://metacpan.org/dist/Sort-Key/source/lib/Sort/Key/Natural.pm#L34.

[reply]
[d/l]
[select]

Sort::Key::Natural

For instance, it can handle arbitrarily large numbers or Unicode.

[reply]

G'day swl,

See "Re^7: How can I do a numeric sort on a substring? [Benchmark: reworked and extended]". I've added sort_key_integer and sort_key_natural (made some guesses about the code) as well as a couple of additions of my own.

"It would be interesting to know how the Sort::Key approaches go under a more recent Perl."

I'm not seeing a huge difference between your output and my latest. SKn is at the slow end of the spectrum; SKi is by far the fastest (substantially faster on my system with Perl 5.34.0).

— Ken

[reply]
[d/l]
[select]