++
Many thanks for tracking down the problem. Much appreciated.
The results are now more in line with what I would have expected.
I see that Perl's string handling function, substr, outstrips the regex solutions:
I have been recommending, for a very long time, that string functions be chosen over regexes
(where they provide equivalent functionality).
I should probably add some ST routines (e.g. STss, STmcs) to see how they fare;
for instance, would GRTpe be faster than STss.
I'm currently at $work, so I can't do that now; I'll look into it this evening (i.e. ~8-10hrs hence).
| [reply] [d/l] [select] |
I wrapped all of the routines in @{[...]} to provide the list context;
that was what I'd used in the preamble tests.
I added an STss as I had indicated this morning.
I decided that STmcs was going to be pretty much the same as STss, so I skipped that one.
I did add an mcse which was mcs with map BLOCK replaced by map EXPR.
sub st_sort_substr {
@{[
map $_->[0],
sort {
$a->[1] <=> $b->[1]
}
map [$_, substr $_, 2], @unordered
]};
}
sub map_cat_substr_expr {
@{[
map "a-$_",
sort {
$a <=> $b
}
map substr($_, 2), @unordered
]};
}
I saw ++swl's post.
There wasn't any code there, so I guessed.
use Sort::Key 'ikeysort';
use Sort::Key::Natural 'natsort';
...
sub sort_key_integer {
@{[
ikeysort { substr $_, 2 } @unordered
]};
}
sub sort_key_natural {
@{[
natsort @unordered
]};
}
I ran the benchmark several times; there were no major differences between runs.
Here's a sample output, in the spoiler;
it's getting very wide (18 subroutines now) and this post is "Re^7", so probably best viewed via the "download" link.
And here's the code:
#!/usr/bin/env perl
use strict;
use warnings;
use namespace::autoclean;
use Benchmark 'cmpthese';
use List::Util 'shuffle';
use Sort::Key 'ikeysort';
use Sort::Key::Natural 'natsort';
my @unordered = qw{a-10 a-01 a-22 a-2 a-0 a-3 a-000 a-1 a-12345 a-1};
my %expanded_abbrev_for = (
sr => 'sort_regex',
STr => 'st_regex',
STrn => 'st_regex_no_index',
STren => 'st_regex_expr_ni',
sra => 'sort_regex_anchored',
STra => 'st_regex_anchored',
STran => 'st_regex_anch_ni',
STraen => 'st_regex_anch_expr_ni',
ss => 'sort_substr',
mcs => 'map_cat_substr',
mcsl => 'map_cat_substr_len',
sp => 'sort_pack',
GRTpe => 'grt_pack_expr',
GRTpeq => 'grt_pack_expr_q',
STss => 'st_sort_substr',
mcse => 'map_cat_substr_expr',
SKi => 'sort_key_integer',
SKn => 'sort_key_natural',
);
my %coderef_for = (
sr => \&sort_regex,
STr => \&st_regex,
STrn => \&st_regex_no_index,
STren => \&st_regex_expr_ni,
sra => \&sort_regex_anchored,
STra => \&st_regex_anchored,
STran => \&st_regex_anch_ni,
STraen => \&st_regex_anch_expr_ni,
ss => \&sort_substr,
mcs => \&map_cat_substr,
mcsl => \&map_cat_substr_len,
sp => \&sort_pack,
GRTpe => \&grt_pack_expr,
GRTpeq => \&grt_pack_expr_q,
STss => \&st_sort_substr,
mcse => \&map_cat_substr_expr,
SKi => \&sort_key_integer,
SKn => \&sort_key_natural,
);
print "Perl & OS:\n $^V on $^O\n";
print "Unordered data (for preamble tests):\n @unordered\n";
print "Preamble tests:\n";
my $tests_fmt = " %-22s %s\n";
for my $name (sort keys %coderef_for) {
printf $tests_fmt, "$expanded_abbrev_for{$name}:",
"@{[$coderef_for{$name}->()]}";
}
exit if @ARGV && $ARGV[0] eq '--dry_run';
print "Legend:\n";
my $legend_fmt = " %-7s %s\n";
for my $abbrev (sort keys %expanded_abbrev_for) {
printf $legend_fmt, "$abbrev:",
$expanded_abbrev_for{$abbrev};
}
# Extend @unordered for improved benchmarking
push @unordered, map "a-$_", shuffle 0..10000;
print "Benchmarks:\n";
print " Note: Unordered data extended with 'map \"a-\$_\", shuffle
+0..10000'\n";
my $count = 0;
cmpthese $count => \%coderef_for;
sub sort_regex {
@{[
sort {
($a =~ /(\d+)/)[0] <=> ($b =~ /(\d+)/)[0]
} @unordered
]};
}
sub st_regex {
@{[
map {
$_->[0]
}
sort {
$a->[1] <=> $b->[1]
}
map {
[$_, (/(\d+)/)[0]]
} @unordered
]};
}
sub st_regex_no_index {
@{[
map {
$_->[0]
}
sort {
$a->[1] <=> $b->[1]
}
map {
[$_, /(\d+)/]
} @unordered
]};
}
sub st_regex_expr_ni {
@{[
map $_->[0],
sort {
$a->[1] <=> $b->[1]
}
map [$_, /(\d+)/], @unordered
]};
}
sub sort_regex_anchored {
@{[
sort {
($a =~ /(\d+)$/)[0] <=> ($b =~ /(\d+)$/)[0]
} @unordered
]};
}
sub st_regex_anchored {
@{[
map {
$_->[0]
}
sort {
$a->[1] <=> $b->[1]
}
map {
[$_, (/(\d+)$/)[0]]
} @unordered
]};
}
sub st_regex_anch_ni {
@{[
map {
$_->[0]
}
sort {
$a->[1] <=> $b->[1]
}
map {
[$_, /(\d+)$/]
} @unordered
]};
}
sub st_regex_anch_expr_ni {
@{[
map $_->[0],
sort {
$a->[1] <=> $b->[1]
}
map [$_, /(\d+)$/], @unordered
]};
}
sub st_sort_substr {
@{[
map $_->[0],
sort {
$a->[1] <=> $b->[1]
}
map [$_, substr $_, 2], @unordered
]};
}
sub sort_substr {
@{[
sort {
substr($a, 2) <=> substr($b, 2)
} @unordered
]};
}
sub map_cat_substr {
@{[
map {
'a-' . $_
}
sort {
$a <=> $b
}
map {
substr $_, 2
} @unordered
]};
}
sub map_cat_substr_expr {
@{[
map "a-$_",
sort {
$a <=> $b
}
map substr($_, 2), @unordered
]};
}
sub map_cat_substr_len {
@{[
map {
'a-' . $_
}
sort {
$a <=> $b
}
map {
substr $_, 2, length($_) - 2
} @unordered
]};
}
sub sort_pack {
@{[
sort {
pack(L => substr($a, 2)) cmp pack(L => substr($b, 2))
} @unordered
]};
}
sub grt_pack_expr {
@{[
map substr($_, 4),
sort
map pack(L => substr($_, 2)) . $_, @unordered
]};
}
sub grt_pack_expr_q {
@{[
map substr($_, 8),
sort
map pack(Q => substr($_, 2)) . $_, @unordered
]};
}
sub sort_key_integer {
@{[
ikeysort { substr $_, 2 } @unordered
]};
}
sub sort_key_natural {
@{[
natsort @unordered
]};
}
| [reply] [d/l] [select] |
Not providing the code is a fairly egregious oversight on my part. Apologies for that.
Your usage is the same as mine, but my code is in the spoiler for completeness.
#!/usr/bin/env perl
use strict;
use warnings;
use namespace::autoclean;
use Benchmark 'cmpthese';
use List::Util 'shuffle';
use Sort::Key qw /ikeysort/;
use Sort::Key::Natural qw /natsort/;
my @ordered;
my @unordered = qw{a-10 a-01 a-22 a-2 a-0 a-3 a-000 a-1 a-12345 a-1};
my %expanded_abbrev_for = (
sr => 'sort_regex',
STr => 'st_regex',
STrn => 'st_regex_no_index',
STren => 'st_regex_expr_ni',
sra => 'sort_regex_anchored',
STra => 'st_regex_anchored',
STran => 'st_regex_anch_ni',
STraen => 'st_regex_anch_expr_ni',
ss => 'sort_substr',
mcs => 'map_cat_substr',
mcsl => 'map_cat_substr_len',
sp => 'sort_pack',
GRTpe => 'grt_pack_expr',
GRTpeq => 'grt_pack_expr_q',
SKn => 'sort_key_natural',
SKi => 'sort_key_integer',
);
my %coderef_for = (
sr => \&sort_regex,
STr => \&st_regex,
STrn => \&st_regex_no_index,
STren => \&st_regex_expr_ni,
sra => \&sort_regex_anchored,
STra => \&st_regex_anchored,
STran => \&st_regex_anch_ni,
STraen => \&st_regex_anch_expr_ni,
ss => \&sort_substr,
mcs => \&map_cat_substr,
mcsl => \&map_cat_substr_len,
sp => \&sort_pack,
GRTpe => \&grt_pack_expr,
GRTpeq => \&grt_pack_expr_q,
SKn => \&sort_key_natural,
SKi => \&sort_key_integer,
);
print "Perl & OS:\n $^V on $^O\n";
print "Unordered data (for preamble tests):\n @unordered\n";
print "Preamble tests:\n";
my $tests_fmt = " %-22s %s\n";
for my $name (sort keys %coderef_for) {
printf $tests_fmt, "$expanded_abbrev_for{$name}:",
"@{[$coderef_for{$name}->()]}";
}
exit if @ARGV && $ARGV[0] eq '--dry_run';
print "Legend:\n";
my $legend_fmt = " %-7s %s\n";
for my $abbrev (sort keys %expanded_abbrev_for) {
printf $legend_fmt, "$abbrev:",
$expanded_abbrev_for{$abbrev};
}
# Extend @unordered for improved benchmarking
my $max = 10000;
push @unordered, map "a-$_", shuffle 0..$max;
print "Benchmarks:\n";
print " Note: Unordered data extended with 'map \"a-\$_\", shuffle
+0..$max'\n";
my $count = 0;
cmpthese $count => \%coderef_for;
sub sort_regex {
@ordered =
sort {
($a =~ /(\d+)/)[0] <=> ($b =~ /(\d+)/)[0]
} @unordered;
}
sub st_regex {
@ordered =
map {
$_->[0]
}
sort {
$a->[1] <=> $b->[1]
}
map {
[$_, (/(\d+)/)[0]]
} @unordered;
}
sub st_regex_no_index {
@ordered =
map {
$_->[0]
}
sort {
$a->[1] <=> $b->[1]
}
map {
[$_, /(\d+)/]
} @unordered;
}
sub st_regex_expr_ni {
@ordered =
map $_->[0],
sort {
$a->[1] <=> $b->[1]
}
map [$_, /(\d+)/], @unordered;
}
sub sort_regex_anchored {
@ordered =
sort {
($a =~ /(\d+)$/)[0] <=> ($b =~ /(\d+)$/)[0]
} @unordered;
}
sub st_regex_anchored {
@ordered =
map {
$_->[0]
}
sort {
$a->[1] <=> $b->[1]
}
map {
[$_, (/(\d+)$/)[0]]
} @unordered;
}
sub st_regex_anch_ni {
@ordered =
map {
$_->[0]
}
sort {
$a->[1] <=> $b->[1]
}
map {
[$_, /(\d+)$/]
} @unordered;
}
sub st_regex_anch_expr_ni {
@ordered =
map $_->[0],
sort {
$a->[1] <=> $b->[1]
}
map [$_, /(\d+)$/], @unordered;
}
sub sort_substr {
@ordered =
sort {
substr($a, 2) <=> substr($b, 2)
} @unordered;
}
sub map_cat_substr {
@ordered =
map {
'a-' . $_
}
sort {
$a <=> $b
}
map {
substr $_, 2
} @unordered;
}
sub map_cat_substr_len {
@ordered =
map {
'a-' . $_
}
sort {
$a <=> $b
}
map {
substr $_, 2, length($_) - 2
} @unordered;
}
sub sort_pack {
@ordered =
sort {
pack(L => substr($a, 2)) cmp pack(L => substr($b, 2))
} @unordered;
}
sub grt_pack_expr {
@ordered =
map substr($_, 4),
sort
map pack(L => substr($_, 2)) . $_, @unordered;
}
sub grt_pack_expr_q {
@ordered =
map substr($_, 8),
sort
map pack(Q => substr($_, 2)) . $_, @unordered;
}
sub sort_key_natural {
@ordered =
natsort @unordered;
}
sub sort_key_integer {
@ordered =
ikeysort {substr $_, 2} @unordered;
}
| [reply] [d/l] |
Out of curiosity I added some subs using Sort::Key. sort_key_natural is the natsort function from Sort::Key::Natural while sort_key_integer uses the ikeysort function from Sort::Key in tandem with substr.
The natsort approach is not particularly fast, but this is perhaps to be expected given it is a general purpose function (as are the unanchored regex approaches). I guess the integer key approach is faster as it takes advantage of direct string operations when building the keys, and then whatever optimisations Sort::Key uses internally.
I assume the differences in the order of the other approaches compared with Lanx's is due to the code being run on Strawberry perl 5.28. It would be interesting to know how the Sort::Key approaches go under a more recent Perl.
Edit: And now I look at the source code for Sort::Key::Natural, it is uses a regex approach to divide the string and pad out the numeric sections, so it is not surprising that it is slower than the other regex based approaches here. https://metacpan.org/dist/Sort-Key/source/lib/Sort/Key/Natural.pm#L34.
| [reply] [d/l] [select] |
| [reply] |