in reply to Re^5: How can I do a numeric sort on a substring? (context matters)
in thread How can I do a numeric sort on a substring?

++ Many thanks for tracking down the problem. Much appreciated.

The results are now more in line with what I would have expected. I see that Perl's string handling function, substr, outstrips the regex solutions: I have been recommending, for a very long time, that string functions be chosen over regexes (where they provide equivalent functionality).

I should probably add some ST routines (e.g. STss, STmcs) to see how they fare; for instance, would GRTpe be faster than STss. I'm currently at $work, so I can't do that now; I'll look into it this evening (i.e. ~8-10hrs hence).

— Ken

Replies are listed 'Best First'.
Re^7: How can I do a numeric sort on a substring? [Benchmark: reworked and extended]
by kcott (Archbishop) on Jun 28, 2021 at 08:21 UTC

    I wrapped all of the routines in @{[...]} to provide the list context; that was what I'd used in the preamble tests.

    I added an STss as I had indicated this morning. I decided that STmcs was going to be pretty much the same as STss, so I skipped that one. I did add an mcse which was mcs with map BLOCK replaced by map EXPR.

    sub st_sort_substr { @{[ map $_->[0], sort { $a->[1] <=> $b->[1] } map [$_, substr $_, 2], @unordered ]}; } sub map_cat_substr_expr { @{[ map "a-$_", sort { $a <=> $b } map substr($_, 2), @unordered ]}; }

    I saw ++swl's post. There wasn't any code there, so I guessed.

    use Sort::Key 'ikeysort'; use Sort::Key::Natural 'natsort'; ... sub sort_key_integer { @{[ ikeysort { substr $_, 2 } @unordered ]}; } sub sort_key_natural { @{[ natsort @unordered ]}; }

    I ran the benchmark several times; there were no major differences between runs. Here's a sample output, in the spoiler; it's getting very wide (18 subroutines now) and this post is "Re^7", so probably best viewed via the "download" link.

    And here's the code:

    #!/usr/bin/env perl use strict; use warnings; use namespace::autoclean; use Benchmark 'cmpthese'; use List::Util 'shuffle'; use Sort::Key 'ikeysort'; use Sort::Key::Natural 'natsort'; my @unordered = qw{a-10 a-01 a-22 a-2 a-0 a-3 a-000 a-1 a-12345 a-1}; my %expanded_abbrev_for = ( sr => 'sort_regex', STr => 'st_regex', STrn => 'st_regex_no_index', STren => 'st_regex_expr_ni', sra => 'sort_regex_anchored', STra => 'st_regex_anchored', STran => 'st_regex_anch_ni', STraen => 'st_regex_anch_expr_ni', ss => 'sort_substr', mcs => 'map_cat_substr', mcsl => 'map_cat_substr_len', sp => 'sort_pack', GRTpe => 'grt_pack_expr', GRTpeq => 'grt_pack_expr_q', STss => 'st_sort_substr', mcse => 'map_cat_substr_expr', SKi => 'sort_key_integer', SKn => 'sort_key_natural', ); my %coderef_for = ( sr => \&sort_regex, STr => \&st_regex, STrn => \&st_regex_no_index, STren => \&st_regex_expr_ni, sra => \&sort_regex_anchored, STra => \&st_regex_anchored, STran => \&st_regex_anch_ni, STraen => \&st_regex_anch_expr_ni, ss => \&sort_substr, mcs => \&map_cat_substr, mcsl => \&map_cat_substr_len, sp => \&sort_pack, GRTpe => \&grt_pack_expr, GRTpeq => \&grt_pack_expr_q, STss => \&st_sort_substr, mcse => \&map_cat_substr_expr, SKi => \&sort_key_integer, SKn => \&sort_key_natural, ); print "Perl & OS:\n $^V on $^O\n"; print "Unordered data (for preamble tests):\n @unordered\n"; print "Preamble tests:\n"; my $tests_fmt = " %-22s %s\n"; for my $name (sort keys %coderef_for) { printf $tests_fmt, "$expanded_abbrev_for{$name}:", "@{[$coderef_for{$name}->()]}"; } exit if @ARGV && $ARGV[0] eq '--dry_run'; print "Legend:\n"; my $legend_fmt = " %-7s %s\n"; for my $abbrev (sort keys %expanded_abbrev_for) { printf $legend_fmt, "$abbrev:", $expanded_abbrev_for{$abbrev}; } # Extend @unordered for improved benchmarking push @unordered, map "a-$_", shuffle 0..10000; print "Benchmarks:\n"; print " Note: Unordered data extended with 'map \"a-\$_\", shuffle +0..10000'\n"; my $count = 0; cmpthese $count => \%coderef_for; sub sort_regex { @{[ sort { ($a =~ /(\d+)/)[0] <=> ($b =~ /(\d+)/)[0] } @unordered ]}; } sub st_regex { @{[ map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [$_, (/(\d+)/)[0]] } @unordered ]}; } sub st_regex_no_index { @{[ map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [$_, /(\d+)/] } @unordered ]}; } sub st_regex_expr_ni { @{[ map $_->[0], sort { $a->[1] <=> $b->[1] } map [$_, /(\d+)/], @unordered ]}; } sub sort_regex_anchored { @{[ sort { ($a =~ /(\d+)$/)[0] <=> ($b =~ /(\d+)$/)[0] } @unordered ]}; } sub st_regex_anchored { @{[ map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [$_, (/(\d+)$/)[0]] } @unordered ]}; } sub st_regex_anch_ni { @{[ map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [$_, /(\d+)$/] } @unordered ]}; } sub st_regex_anch_expr_ni { @{[ map $_->[0], sort { $a->[1] <=> $b->[1] } map [$_, /(\d+)$/], @unordered ]}; } sub st_sort_substr { @{[ map $_->[0], sort { $a->[1] <=> $b->[1] } map [$_, substr $_, 2], @unordered ]}; } sub sort_substr { @{[ sort { substr($a, 2) <=> substr($b, 2) } @unordered ]}; } sub map_cat_substr { @{[ map { 'a-' . $_ } sort { $a <=> $b } map { substr $_, 2 } @unordered ]}; } sub map_cat_substr_expr { @{[ map "a-$_", sort { $a <=> $b } map substr($_, 2), @unordered ]}; } sub map_cat_substr_len { @{[ map { 'a-' . $_ } sort { $a <=> $b } map { substr $_, 2, length($_) - 2 } @unordered ]}; } sub sort_pack { @{[ sort { pack(L => substr($a, 2)) cmp pack(L => substr($b, 2)) } @unordered ]}; } sub grt_pack_expr { @{[ map substr($_, 4), sort map pack(L => substr($_, 2)) . $_, @unordered ]}; } sub grt_pack_expr_q { @{[ map substr($_, 8), sort map pack(Q => substr($_, 2)) . $_, @unordered ]}; } sub sort_key_integer { @{[ ikeysort { substr $_, 2 } @unordered ]}; } sub sort_key_natural { @{[ natsort @unordered ]}; }

    — Ken

      Not providing the code is a fairly egregious oversight on my part. Apologies for that.

      Your usage is the same as mine, but my code is in the spoiler for completeness.