Sorry pal. Most of your posts -- especially those regarding regex -- get an upvote from me, but this one got --. Its a crock.

Apparently it was so bad, you tried to -- it three times!

I was curious about the locus of crockitudinousness and decided to do some benchmarking, usually at the root of these squabbles. (Update: Benchmarked variations include some of those used by kcott here.) I must admit I was shocked, shocked by the results. There were no big surprises until I looked at the effect of the  //p regex modifier. Simply adding this modifier to
    m{ atg ([acgt]+?) (?= taa|tag|tga) }xmsg
in the  push @ra, $1 variation ($push_cg below, which otherwise performs roughly comparably to the other variations) slows its performance by orders of magnitude, so much so that I didn't have the patience to run the benchmark to completion.

Am I doing this right? (Update: I.e., is the effect of the use of  //p as in the  $push_KM sub below, which I don't even have the patience to benchmark, really so egregious?) Is this all down to the  //p modifier? And if so, have the proper authorities been notified? If you've touched on this in other threads, I have not been following these discussions as carefully as I ought. Anyway, here's my benchmark code. As always, I would be interested in any comments you might have.

use warnings; use strict; use Benchmark qw(cmpthese); use constant N => 100_000 # 10_000 ; my ($ss1, $ss2) = qw(gga gcgccccggc); my $atg = 'atg'; my $stop = 'taa'; my $pad = 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'; my $repeat = join '', $pad, $atg, $ss1, $stop, $pad, $atg, $ss2, $stop +, $pad; my $s = $repeat x N; my @ra; my $cg_la = sub { @ra = $s =~ m{ atg ([acgt]+?) (?= taa|tag|tga +) }xmsg }; my $K_la = sub { @ra = $s =~ m{ atg \K [acgt]+? (?= taa|tag|tga +) }xmsg }; my $cg_ncg = sub { @ra = $s =~ m{ atg ([acgt]+?) (?: taa|tag|tga +) }xmsg }; my $cg_atomic = sub { @ra = $s =~ m{ atg ([acgt]+?) (?> taa|tag|tga +) }xmsg }; my $push_cg = sub { @ra = (); push @ra, $1 while $s =~ m{ atg + ([acgt]+?) (?= taa|tag|tga) }xmsg }; my $push_KM = sub { @ra = (); push @ra, ${^MATCH} while $s =~ m{ atg +\K [acgt]+? (?= taa|tag|tga) }xmsgp }; print "validation... \n"; for my $rx_sub_test ( $cg_la, $K_la, $cg_ncg, $cg_atomic, $push_cg, # $push_KM, ) { $rx_sub_test->(); @ra == 2 * N or die "wrong N matches"; my %h = @ra; 1 == keys %h or die "straight hash: wrong N keys"; $h{$ss1} eq $ss2 or die "straight hash: wrong $ss1 => $ss2"; %h = reverse @ra; 1 == keys %h or die "reverse hash: wrong N keys"; $h{$ss2} eq $ss1 or die "reverse hash: wrong $ss2 => $ss1"; } print "benchmarking... \n"; cmpthese(20, { 'cg_la' => $cg_la, 'K_la' => $K_la, 'cg_ncg' => $cg_ncg, 'cg_atomic' => $cg_atomic, 'push_cg' => $push_cg, # 'push_KM' => $push_KM, });

Output:

c:\@Work\Perl\monks\Anonymous Monk\1044183>perl timing_rx_1.pl validation... benchmarking... Rate cg_atomic K_la cg_ncg cg_la push_cg cg_atomic 2.27/s -- -2% -3% -4% -7% K_la 2.31/s 2% -- -2% -2% -6% cg_ncg 2.35/s 3% 2% -- -0% -4% cg_la 2.35/s 4% 2% 0% -- -4% push_cg 2.45/s 8% 6% 4% 4% --

In reply to Re^3: Simple regex question. Grouping with a negative lookahead assertion. by AnomalousMonk
in thread Simple regex question. Grouping with a negative lookahead assertion. by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.