All excellent comments from JavaFan, BrowserUk, and eyepopslikeamosquito.

I hadn't considered the COMMIT/FAIL, partly because this came up when writing tests for some perl 5.8 code. It won't find overlapping matches, unlike the other tests. And I have to think about it more when used in the more general case. Still I'd like to use and try it out, so I'm adding it to the test suite.

As for BrowserUk's observation, yes character classes are preferable when looking for single characters, but this was meant for less trivial "atoms"... I will re-write the tests to be more realistic.

I'm glad that one of my solutions is exactly what The Perl Cookbook recommends, and the cookbook explains it better and talks about overlapping vs non-overlapping cases. Still, we have come up with some variations it does not cover.

New benchmark, taking the above into account, making the things to match/not match a little easier to modify:

#!/usr/bin/env perl use Benchmark 'cmpthese'; my ($t1,$f,$t2)=(qr/people/,qr/babi?es/,qr/health(?:y|ful)/); my %ways = ( '3calls' => sub {/$t1/o && !/$f/o && /$t2/o}, code => sub {m:^(?(?{/$t1/o && !/$f/o && /$t2/o})|(?!)):}, pos_look => sub {/(?=.*$t1)(?!.*$f)(?=.*$t2)/so}, pos_anch => sub {/^(?=.*$t1)(?!.*$f)(?=.*$t2)/so}, neg_look => sub {/(?!(?!.*$t1)|(?=.*$f)|(?!.*$t2))/so}, neg_anch => sub {/^(?!(?!.*$t1)|(?=.*$f)|(?!.*$t2))/so}, commit => sub {/$f(*COMMIT)(*FAIL)|^(?=.*$t1)(?=.*$t2).*$f(*COMMIT) +(*FAIL)|(?=.*$t1)(?=.*$t2)/so} ); while (my ($way, $sub)=each %ways) { die "$way failed to match\n" unless ($_ = 'healthy people') && &$sub +; die "$way had a false positive\n" if ($_= 'healthy people, including + babies') && &$sub; } print "For matching:\n"; $_="Our people eat heathful meals."; cmpthese(-1, \%ways); print "For non-matching:\n"; $_="Healthy people have more babes in arms"; cmpthese(-1, \%ways); __END__

For matching:

Rate commit code neg_look neg_anch pos_look pos_anch 3calls
commit406208/s---40%-64%-66%-69%-70%-91%
code675893/s66%---40%-44%-49%-50%-85%
neg_look1128170/s178%67%---7%-14%-16%-75%
neg_anch1212346/s198%79%7%---8%-10%-73%
pos_look1312820/s223%94%16%8%---2%-71%
pos_anch1344229/s231%99%19%11%2%---71%
3calls4567476/s1024%576%305%277%248%240%--

For non-matching:

Ratecommitcodeneg_anchneg_lookpos_lookpos_anch3calls
commit418623/s---36%-65%-68%-68%-70%-90%
code654350/s56%---45%-49%-50%-54%-85%
neg_anch1180978/s182%80%---9%-10%-17%-73%
neg_look1294154/s209%98%10%---1%-9%-70%
pos_look1308383/s213%100%11%1%---8%-70%
pos_anch1418108/s239%117%20%10%8%---68%
3calls4365151/s943%567%270%237%234%208%--

In reply to Re: Matching and nonmatching multiple regexps at once by Yary
in thread Matching and nonmatching multiple regexps at once by Yary

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.