in reply to How can I do a numeric sort on a substring?
G'day misterperl,
Other than a number of solutions involving modules, there were several pure Perl solutions. I was interested in how these might compare. The results I got were unexpected; I added a lot of variations (currently up to 14) but the same trend continued.
I had thought that the Schwartzian and Guttman-Rosler transformations would have been the fastest, but that wasn't the case. In fact, those that performed extraction of the number, prior to comparison, within the sort, were the quickest. Given these extractions might have been done multiple times for the same data, I suspect Perl (v5.34.0) is performing some optimisation and perhaps caching results; I would be interested in results others receive using different Perl versions. In case there's concern about the number of tests, the entire benchmark should run in under a minute: a COUNT of zero indicates 3 seconds for each of the 14 tests, plus a few seconds for other overhead.
Here's a rough breakdown of the results:
Here's a sample run. It might be worth using the "download" link to see this without wrapping.
Perl & OS: v5.34.0 on cygwin Unordered data (for preamble tests): a-10 a-01 a-22 a-2 a-0 a-3 a-000 a-1 a-12345 a-1 Preamble tests: grt_pack_expr: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 grt_pack_expr_q: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 st_regex: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 st_regex_anchored: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 st_regex_anch_expr_ni: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 st_regex_anch_ni: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 st_regex_expr_ni: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 st_regex_no_index: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 map_cat_substr: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 map_cat_substr_len: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 sort_pack: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 sort_regex: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 sort_regex_anchored: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 sort_substr: a-0 a-000 a-01 a-1 a-1 a-2 a-3 a-10 a-22 a- +12345 Legend: GRTpe: grt_pack_expr GRTpeq: grt_pack_expr_q STr: st_regex STra: st_regex_anchored STraen: st_regex_anch_expr_ni STran: st_regex_anch_ni STren: st_regex_expr_ni STrn: st_regex_no_index mcs: map_cat_substr mcsl: map_cat_substr_len sp: sort_pack sr: sort_regex sra: sort_regex_anchored ss: sort_substr Benchmarks: Note: Unordered data extended with 'map "a-$_", shuffle 0..10000' Rate STraen STran STra STrn STren STr GRTp +e GRTpeq mcsl mcs sp sr ss sra STraen 68.2/s -- -0% -1% -2% -3% -5% -54 +% -54% -60% -61% -100% -100% -100% -100% STran 68.6/s 0% -- -0% -1% -3% -4% -54 +% -54% -60% -61% -100% -100% -100% -100% STra 68.6/s 1% 0% -- -1% -2% -4% -53 +% -54% -60% -61% -100% -100% -100% -100% STrn 69.3/s 2% 1% 1% -- -1% -3% -53 +% -54% -59% -60% -100% -100% -100% -100% STren 70.4/s 3% 3% 3% 2% -- -2% -52 +% -53% -59% -60% -100% -100% -100% -100% STr 71.5/s 5% 4% 4% 3% 2% -- -52 +% -52% -58% -59% -100% -100% -100% -100% GRTpe 147/s 116% 115% 115% 113% 109% 106% - +- -1% -13% -16% -100% -100% -100% -100% GRTpeq 149/s 119% 118% 118% 116% 112% 109% 1 +% -- -12% -15% -100% -100% -100% -100% mcsl 170/s 149% 148% 148% 145% 141% 138% 15 +% 14% -- -3% -100% -100% -100% -100% mcs 175/s 157% 156% 155% 153% 149% 145% 19 +% 17% 3% -- -100% -100% -100% -100% sp 263168/s 385629% 383801% 383272% 379410% 373718% 368096% 178350 +% 175994% 154699% 149996% -- -2% -2% -3% sr 268680/s 393708% 391841% 391301% 387358% 381547% 375807% 182088 +% 179682% 157941% 153139% 2% -- -0% -1% ss 269453/s 394842% 392970% 392428% 388474% 382646% 376890% 182612 +% 180200% 158396% 153581% 2% 0% -- -0% sra 270688/s 396651% 394771% 394227% 390254% 384400% 378617% 183450 +% 181026% 159122% 154285% 3% 1% 0% --
There was a typo in the code; see update below. The original Benchmarks: section is in the spoiler below.
Here's the code:
#!/usr/bin/env perl use strict; use warnings; use namespace::autoclean; use Benchmark 'cmpthese'; use List::Util 'shuffle'; my @unordered = qw{a-10 a-01 a-22 a-2 a-0 a-3 a-000 a-1 a-12345 a-1}; my %expanded_abbrev_for = ( sr => 'sort_regex', STr => 'st_regex', STrn => 'st_regex_no_index', STren => 'st_regex_expr_ni', sra => 'sort_regex_anchored', STra => 'st_regex_anchored', STran => 'st_regex_anch_ni', STraen => 'st_regex_anch_expr_ni', ss => 'sort_substr', mcs => 'map_cat_substr', mcsl => 'map_cat_substr_len', sp => 'sort_pack', GRTpe => 'grt_pack_expr', GRTpeq => 'grt_pack_expr_q', ); my %coderef_for = ( sr => \&sort_regex, STr => \&st_regex, STrn => \&st_regex_no_index, STren => \&st_regex_expr_ni, sra => \&sort_regex_anchored, STra => \&st_regex_anchored, STran => \&st_regex_anch_ni, STraen => \&st_regex_anch_expr_ni, ss => \&sort_substr, mcs => \&map_cat_substr, mcsl => \&map_cat_substr_len, sp => \&sort_pack, GRTpe => \&grt_pack_expr, GRTpeq => \&grt_pack_expr_q, ); print "Perl & OS:\n $^V on $^O\n"; print "Unordered data (for preamble tests):\n @unordered\n"; print "Preamble tests:\n"; my $tests_fmt = " %-22s %s\n"; for my $name (sort keys %coderef_for) { printf $tests_fmt, "$expanded_abbrev_for{$name}:", "@{[$coderef_for{$name}->()]}"; } exit if @ARGV && $ARGV[0] eq '--dry_run'; print "Legend:\n"; my $legend_fmt = " %-7s %s\n"; for my $abbrev (sort keys %expanded_abbrev_for) { printf $legend_fmt, "$abbrev:", $expanded_abbrev_for{$abbrev}; } # Extend @unordered for improved benchmarking push @unordered, map "a-$_", shuffle 0..10000; print "Benchmarks:\n"; print " Note: Unordered data extended with 'map \"a-\$_\", shuffle +0..10000'\n"; my $count = 0; cmpthese $count => \%coderef_for; sub sort_regex { sort { ($a =~ /(\d+)/)[0] <=> ($b =~ /(\d+)/)[0] } @unordered; } sub st_regex { map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [$_, (/(\d+)/)[0]] } @unordered; } sub st_regex_no_index { map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [$_, /(\d+)/] } @unordered; } sub st_regex_expr_ni { map $_->[0], sort { $a->[1] <=> $b->[1] } map [$_, /(\d+)/], @unordered; } sub sort_regex_anchored { sort { ($a =~ /(\d+)$/)[0] <=> ($b =~ /(\d+)$/)[0] } @unordered; } sub st_regex_anchored { map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [$_, (/(\d+)$/)[0]] } @unordered; } sub st_regex_anch_ni { map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [$_, /(\d+)$/] } @unordered; } sub st_regex_anch_expr_ni { map $_->[0], sort { $a->[1] <=> $b->[1] } map [$_, /(\d+)$/], @unordered; } sub sort_substr { sort { substr($a, 2) <=> substr($b, 2) } @unordered; } sub map_cat_substr { map { 'a-' . $_ } sort { $a <=> $b } map { substr $_, 2 } @unordered; } sub map_cat_substr_len { map { 'a-' . $_ } sort { $a <=> $b } map { substr $_, 2, length($_) - 2 } @unordered; } sub sort_pack { sort { pack(L => substr($a, 2)) cmp pack(L => substr($b, 2)) } @unordered; } sub grt_pack_expr { map substr($_, 4), sort map pack(L => substr($_, 2)) . $_, @unordered; } sub grt_pack_expr_q { map substr($_, 8), sort map pack(Q => substr($_, 2)) . $_, @unordered; }
Notes on pack:
Update: There was a typo in my original code: the first key/value pair in %coderef_for was sr => \&sort_substr but it should have been sr => \&sort_regex. I've fixed my code locally and edited the code above to reflect this fix.
I reran the code and have posted a new Benchmarks: section. All of the output before Benchmarks: is unchanged from the original posting. The old Benchmarks: section is now in a spoiler. The results are not particularly different — except sr and ss are no longer identical; although, still too close to call — and my original "rough breakdown of the results" has not changed.
— Ken
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: How can I do a numeric sort on a substring? [Benchmark]
by LanX (Saint) on Jun 26, 2021 at 13:28 UTC | |
by kcott (Archbishop) on Jun 27, 2021 at 02:41 UTC | |
by LanX (Saint) on Jun 27, 2021 at 11:56 UTC | |
by LanX (Saint) on Jun 27, 2021 at 13:46 UTC | |
by kcott (Archbishop) on Jun 27, 2021 at 23:22 UTC | |
| |
by swl (Prior) on Jun 28, 2021 at 04:00 UTC | |
| |
|
Re^2: How can I do a numeric sort on a substring? [Benchmark]
by Anonymous Monk on Jun 26, 2021 at 08:18 UTC |