comment on

All excellent comments from JavaFan, BrowserUk, and eyepopslikeamosquito.

I hadn't considered the COMMIT/FAIL, partly because this came up when writing tests for some perl 5.8 code. It won't find overlapping matches, unlike the other tests. And I have to think about it more when used in the more general case. Still I'd like to use and try it out, so I'm adding it to the test suite.

As for BrowserUk's observation, yes character classes are preferable when looking for single characters, but this was meant for less trivial "atoms"... I will re-write the tests to be more realistic.

I'm glad that one of my solutions is exactly what The Perl Cookbook recommends, and the cookbook explains it better and talks about overlapping vs non-overlapping cases. Still, we have come up with some variations it does not cover.

New benchmark, taking the above into account, making the things to match/not match a little easier to modify:

#!/usr/bin/env perl
use Benchmark 'cmpthese';

my ($t1,$f,$t2)=(qr/people/,qr/babi?es/,qr/health(?:y|ful)/);

my %ways =
  (
   '3calls' => sub {/$t1/o && !/$f/o && /$t2/o},
   code     => sub {m:^(?(?{/$t1/o && !/$f/o && /$t2/o})|(?!)):},
   pos_look => sub {/(?=.*$t1)(?!.*$f)(?=.*$t2)/so},
   pos_anch => sub {/^(?=.*$t1)(?!.*$f)(?=.*$t2)/so},
   neg_look => sub {/(?!(?!.*$t1)|(?=.*$f)|(?!.*$t2))/so},
   neg_anch => sub {/^(?!(?!.*$t1)|(?=.*$f)|(?!.*$t2))/so},
   commit => sub {/$f(*COMMIT)(*FAIL)|^(?=.*$t1)(?=.*$t2).*$f(*COMMIT)
+(*FAIL)|(?=.*$t1)(?=.*$t2)/so}
  );

while (my ($way, $sub)=each %ways) {
  die "$way failed to match\n" unless ($_ = 'healthy people') && &$sub
+;
  die "$way had a false positive\n" if ($_= 'healthy people, including
+ babies') && &$sub;
}

print "For matching:\n";
$_="Our people eat heathful meals.";
cmpthese(-1, \%ways);


print "For non-matching:\n";
$_="Healthy people have more babes in arms";
cmpthese(-1, \%ways);
__END__
[download]

For matching:

	Rate	commit	code	neg_look	neg_anch	pos_look	pos_anch	3calls
commit	406208/s	--	-40%	-64%	-66%	-69%	-70%	-91%
code	675893/s	66%	--	-40%	-44%	-49%	-50%	-85%
neg_look	1128170/s	178%	67%	--	-7%	-14%	-16%	-75%
neg_anch	1212346/s	198%	79%	7%	--	-8%	-10%	-73%
pos_look	1312820/s	223%	94%	16%	8%	--	-2%	-71%
pos_anch	1344229/s	231%	99%	19%	11%	2%	--	-71%
3calls	4567476/s	1024%	576%	305%	277%	248%	240%	--

For non-matching:

	Rate	commit	code	neg_anch	neg_look	pos_look	pos_anch	3calls
commit	418623/s	--	-36%	-65%	-68%	-68%	-70%	-90%
code	654350/s	56%	--	-45%	-49%	-50%	-54%	-85%
neg_anch	1180978/s	182%	80%	--	-9%	-10%	-17%	-73%
neg_look	1294154/s	209%	98%	10%	--	-1%	-9%	-70%
pos_look	1308383/s	213%	100%	11%	1%	--	-8%	-70%
pos_anch	1418108/s	239%	117%	20%	10%	8%	--	-68%
3calls	4365151/s	943%	567%	270%	237%	234%	208%	--

In reply to Re: Matching and nonmatching multiple regexps at once by Yary
in thread Matching and nonmatching multiple regexps at once by Yary

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.