A minor flamewar erupted on the CB this morning (well, my time anyway) when a user asked for help understanding the following code:
Apparently, someone told this user1 that this would solve the problem. Apparently, it did, but, as a good student, he was attempting to really understand it. Specifically, the [0] parts. My suggestion was to learn the Schwarzian Transform, and then we could simplify this. And, as an added bonus, I suggested that it'd probably be faster in that there'd be far fewer regex extractions. Another experienced user took umbrage at this, warning that I shouldn't underestimate the overhead of creating anonymous arrays, nor the smallness of ln N when comparing N regexes in the ST to 2NlnN regexes in the original compare routine. Then this user pointed out the pathological case of pre-sorted lists, where there were only 99 comparisons in a 100-element array - though that's still 198 regexes vs ST's 100. So, I had to look into this more.($a =~ /(\d+)/)[0] <=> ($b =~ /(\d+)/)[0]
Please note, that this came up for two reasons: a) the original user was confused, and I thought that splitting this up would make things easier to understand (and the ST is a useful idiom to learn in general), and b) I thought regex's were relatively expensive (I wouldn't have bothered if it were unpack or substr) for running in a tight loop like this.
So, first the code:
#!/usr/bin/perl use strict; use warnings; sub compare; my @l; for (10, 100, 1000) { @l = map { "a$_" } 1..$_; compare "Pre-sorted"; @l = reverse @l; compare "Reverse-sorted"; use List::Util qw(shuffle); @l = shuffle @l; compare "Shuffled"; } my %regexes; sub brute_force { my $c = 0; my @s = sort { $c += 2; ($a =~ /(\d+)/)[0] <=> ($b =~ /(\d+)/)[0] } @l; push @{$regexes{force}}, $c; } sub ST { my $c = 0; my @s = map { $_->[1] } sort { $a->[0] <=> $b->[0] } map { $c += 1; /(\d+)/; [ $1, $_ ] } @l; push @{$regexes{ST}}, $c; } sub compare { use Benchmark qw(cmpthese); my $title = shift; my $len = @l; $title = "$title ($len)."; print $title, "\n"; print '-' x length $title, "\n"; %regexes = (); cmpthese(-1, { force => \&brute_force, ST => \&ST, } ); for (qw(force ST)) { use List::Util qw(sum); my $avg = sum(@{$regexes{$_}}) / @{$regexes{$_}}; printf "%10s %d regexes\n", $_, $avg; } print "\n"; }
Of course, the number of regexes for ST is obvious - it's the brute-force method where it's not always obvious.Pre-sorted (10). ---------------- Rate force ST force 34461/s -- -10% ST 38399/s 11% -- force 38 regexes ST 10 regexes Reverse-sorted (10). -------------------- Rate ST force ST 39097/s -- -5% force 41353/s 6% -- force 30 regexes ST 10 regexes Shuffled (10). -------------- Rate force ST force 26795/s -- -25% ST 35544/s 33% -- force 50 regexes ST 10 regexes Pre-sorted (100). ----------------- Rate ST force ST 4696/s -- -30% force 6698/s 43% -- force 198 regexes ST 100 regexes Reverse-sorted (100). --------------------- Rate ST force ST 4745/s -- -30% force 6761/s 43% -- force 198 regexes ST 100 regexes Shuffled (100). --------------- Rate force ST force 1347/s -- -59% ST 3257/s 142% -- force 1096 regexes ST 100 regexes Pre-sorted (1000). ------------------ Rate ST force ST 439/s -- -35% force 673/s 53% -- force 1998 regexes ST 1000 regexes Reverse-sorted (1000). ---------------------- Rate ST force ST 427/s -- -37% force 673/s 58% -- force 1998 regexes ST 1000 regexes Shuffled (1000). ---------------- Rate force ST force 86.3/s -- -64% ST 243/s 181% -- force 17440 regexes ST 1000 regexes
1 For privacy reasons, I'm not naming any user involved in the discussion other than myself. If they want to out themselves, that's their decision, not mine, to make.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Underestimage (ST vs brute force)
by ikegami (Patriarch) on Dec 09, 2008 at 17:57 UTC | |
by Tanktalus (Canon) on Dec 09, 2008 at 19:17 UTC | |
Re: Underestimage (ST vs brute force)
by eric256 (Parson) on Dec 09, 2008 at 20:38 UTC | |
Re: Underestimage (ST vs brute force)
by Porculus (Hermit) on Dec 10, 2008 at 23:31 UTC | |
Re: Underestimage (ST vs brute force)
by brian_d_foy (Abbot) on Dec 11, 2008 at 20:06 UTC |