A minor flamewar erupted on the CB this morning (well, my time anyway) when a user asked for help understanding the following code:
Apparently, someone told this user1 that this would solve the problem. Apparently, it did, but, as a good student, he was attempting to really understand it. Specifically, the [0] parts. My suggestion was to learn the Schwarzian Transform, and then we could simplify this. And, as an added bonus, I suggested that it'd probably be faster in that there'd be far fewer regex extractions. Another experienced user took umbrage at this, warning that I shouldn't underestimate the overhead of creating anonymous arrays, nor the smallness of ln N when comparing N regexes in the ST to 2NlnN regexes in the original compare routine. Then this user pointed out the pathological case of pre-sorted lists, where there were only 99 comparisons in a 100-element array - though that's still 198 regexes vs ST's 100. So, I had to look into this more.($a =~ /(\d+)/)[0] <=> ($b =~ /(\d+)/)[0]
Please note, that this came up for two reasons: a) the original user was confused, and I thought that splitting this up would make things easier to understand (and the ST is a useful idiom to learn in general), and b) I thought regex's were relatively expensive (I wouldn't have bothered if it were unpack or substr) for running in a tight loop like this.
So, first the code:
#!/usr/bin/perl use strict; use warnings; sub compare; my @l; for (10, 100, 1000) { @l = map { "a$_" } 1..$_; compare "Pre-sorted"; @l = reverse @l; compare "Reverse-sorted"; use List::Util qw(shuffle); @l = shuffle @l; compare "Shuffled"; } my %regexes; sub brute_force { my $c = 0; my @s = sort { $c += 2; ($a =~ /(\d+)/)[0] <=> ($b =~ /(\d+)/)[0] } @l; push @{$regexes{force}}, $c; } sub ST { my $c = 0; my @s = map { $_->[1] } sort { $a->[0] <=> $b->[0] } map { $c += 1; /(\d+)/; [ $1, $_ ] } @l; push @{$regexes{ST}}, $c; } sub compare { use Benchmark qw(cmpthese); my $title = shift; my $len = @l; $title = "$title ($len)."; print $title, "\n"; print '-' x length $title, "\n"; %regexes = (); cmpthese(-1, { force => \&brute_force, ST => \&ST, } ); for (qw(force ST)) { use List::Util qw(sum); my $avg = sum(@{$regexes{$_}}) / @{$regexes{$_}}; printf "%10s %d regexes\n", $_, $avg; } print "\n"; }
Of course, the number of regexes for ST is obvious - it's the brute-force method where it's not always obvious.Pre-sorted (10). ---------------- Rate force ST force 34461/s -- -10% ST 38399/s 11% -- force 38 regexes ST 10 regexes Reverse-sorted (10). -------------------- Rate ST force ST 39097/s -- -5% force 41353/s 6% -- force 30 regexes ST 10 regexes Shuffled (10). -------------- Rate force ST force 26795/s -- -25% ST 35544/s 33% -- force 50 regexes ST 10 regexes Pre-sorted (100). ----------------- Rate ST force ST 4696/s -- -30% force 6698/s 43% -- force 198 regexes ST 100 regexes Reverse-sorted (100). --------------------- Rate ST force ST 4745/s -- -30% force 6761/s 43% -- force 198 regexes ST 100 regexes Shuffled (100). --------------- Rate force ST force 1347/s -- -59% ST 3257/s 142% -- force 1096 regexes ST 100 regexes Pre-sorted (1000). ------------------ Rate ST force ST 439/s -- -35% force 673/s 53% -- force 1998 regexes ST 1000 regexes Reverse-sorted (1000). ---------------------- Rate ST force ST 427/s -- -37% force 673/s 58% -- force 1998 regexes ST 1000 regexes Shuffled (1000). ---------------- Rate force ST force 86.3/s -- -64% ST 243/s 181% -- force 17440 regexes ST 1000 regexes
1 For privacy reasons, I'm not naming any user involved in the discussion other than myself. If they want to out themselves, that's their decision, not mine, to make.
In reply to Underestimage (ST vs brute force) by Tanktalus
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |