in reply to Re^2: Simple regex question. Grouping with a negative lookahead assertion.
in thread Simple regex question. Grouping with a negative lookahead assertion.
Sorry pal. Most of your posts -- especially those regarding regex -- get an upvote from me, but this one got --. Its a crock.
Apparently it was so bad, you tried to -- it three times!
I was curious about the locus of crockitudinousness and decided to do some benchmarking, usually at the root of these squabbles. (Update: Benchmarked variations include some of those used by kcott here.) I must admit I was shocked, shocked by the results. There were no big surprises until I looked at the effect of the //p regex modifier. Simply adding this modifier to
m{ atg ([acgt]+?) (?= taa|tag|tga) }xmsg
in the push @ra, $1 variation ($push_cg below, which otherwise performs roughly comparably to the other variations) slows its performance by orders of magnitude, so much so that I didn't have the patience to run the benchmark to completion.
Am I doing this right? (Update: I.e., is the effect of the use of //p as in the $push_KM sub below, which I don't even have the patience to benchmark, really so egregious?) Is this all down to the //p modifier? And if so, have the proper authorities been notified? If you've touched on this in other threads, I have not been following these discussions as carefully as I ought. Anyway, here's my benchmark code. As always, I would be interested in any comments you might have.
use warnings; use strict; use Benchmark qw(cmpthese); use constant N => 100_000 # 10_000 ; my ($ss1, $ss2) = qw(gga gcgccccggc); my $atg = 'atg'; my $stop = 'taa'; my $pad = 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'; my $repeat = join '', $pad, $atg, $ss1, $stop, $pad, $atg, $ss2, $stop +, $pad; my $s = $repeat x N; my @ra; my $cg_la = sub { @ra = $s =~ m{ atg ([acgt]+?) (?= taa|tag|tga +) }xmsg }; my $K_la = sub { @ra = $s =~ m{ atg \K [acgt]+? (?= taa|tag|tga +) }xmsg }; my $cg_ncg = sub { @ra = $s =~ m{ atg ([acgt]+?) (?: taa|tag|tga +) }xmsg }; my $cg_atomic = sub { @ra = $s =~ m{ atg ([acgt]+?) (?> taa|tag|tga +) }xmsg }; my $push_cg = sub { @ra = (); push @ra, $1 while $s =~ m{ atg + ([acgt]+?) (?= taa|tag|tga) }xmsg }; my $push_KM = sub { @ra = (); push @ra, ${^MATCH} while $s =~ m{ atg +\K [acgt]+? (?= taa|tag|tga) }xmsgp }; print "validation... \n"; for my $rx_sub_test ( $cg_la, $K_la, $cg_ncg, $cg_atomic, $push_cg, # $push_KM, ) { $rx_sub_test->(); @ra == 2 * N or die "wrong N matches"; my %h = @ra; 1 == keys %h or die "straight hash: wrong N keys"; $h{$ss1} eq $ss2 or die "straight hash: wrong $ss1 => $ss2"; %h = reverse @ra; 1 == keys %h or die "reverse hash: wrong N keys"; $h{$ss2} eq $ss1 or die "reverse hash: wrong $ss2 => $ss1"; } print "benchmarking... \n"; cmpthese(20, { 'cg_la' => $cg_la, 'K_la' => $K_la, 'cg_ncg' => $cg_ncg, 'cg_atomic' => $cg_atomic, 'push_cg' => $push_cg, # 'push_KM' => $push_KM, });
Output:
c:\@Work\Perl\monks\Anonymous Monk\1044183>perl timing_rx_1.pl validation... benchmarking... Rate cg_atomic K_la cg_ncg cg_la push_cg cg_atomic 2.27/s -- -2% -3% -4% -7% K_la 2.31/s 2% -- -2% -2% -6% cg_ncg 2.35/s 3% 2% -- -0% -4% cg_la 2.35/s 4% 2% 0% -- -4% push_cg 2.45/s 8% 6% 4% 4% --
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Simple regex question. Grouping with a negative lookahead assertion.
by kcott (Archbishop) on Jul 15, 2013 at 08:43 UTC |