in reply to Matching and nonmatching multiple regexps at once
All excellent comments from JavaFan, BrowserUk, and eyepopslikeamosquito.
I hadn't considered the COMMIT/FAIL, partly because this came up when writing tests for some perl 5.8 code. It won't find overlapping matches, unlike the other tests. And I have to think about it more when used in the more general case. Still I'd like to use and try it out, so I'm adding it to the test suite.
As for BrowserUk's observation, yes character classes are preferable when looking for single characters, but this was meant for less trivial "atoms"... I will re-write the tests to be more realistic.
I'm glad that one of my solutions is exactly what The Perl Cookbook recommends, and the cookbook explains it better and talks about overlapping vs non-overlapping cases. Still, we have come up with some variations it does not cover.
New benchmark, taking the above into account, making the things to match/not match a little easier to modify:
#!/usr/bin/env perl use Benchmark 'cmpthese'; my ($t1,$f,$t2)=(qr/people/,qr/babi?es/,qr/health(?:y|ful)/); my %ways = ( '3calls' => sub {/$t1/o && !/$f/o && /$t2/o}, code => sub {m:^(?(?{/$t1/o && !/$f/o && /$t2/o})|(?!)):}, pos_look => sub {/(?=.*$t1)(?!.*$f)(?=.*$t2)/so}, pos_anch => sub {/^(?=.*$t1)(?!.*$f)(?=.*$t2)/so}, neg_look => sub {/(?!(?!.*$t1)|(?=.*$f)|(?!.*$t2))/so}, neg_anch => sub {/^(?!(?!.*$t1)|(?=.*$f)|(?!.*$t2))/so}, commit => sub {/$f(*COMMIT)(*FAIL)|^(?=.*$t1)(?=.*$t2).*$f(*COMMIT) +(*FAIL)|(?=.*$t1)(?=.*$t2)/so} ); while (my ($way, $sub)=each %ways) { die "$way failed to match\n" unless ($_ = 'healthy people') && &$sub +; die "$way had a false positive\n" if ($_= 'healthy people, including + babies') && &$sub; } print "For matching:\n"; $_="Our people eat heathful meals."; cmpthese(-1, \%ways); print "For non-matching:\n"; $_="Healthy people have more babes in arms"; cmpthese(-1, \%ways); __END__
For matching:
Rate | commit | code | neg_look | neg_anch | pos_look | pos_anch | 3calls | |
---|---|---|---|---|---|---|---|---|
commit | 406208/s | -- | -40% | -64% | -66% | -69% | -70% | -91% |
code | 675893/s | 66% | -- | -40% | -44% | -49% | -50% | -85% |
neg_look | 1128170/s | 178% | 67% | -- | -7% | -14% | -16% | -75% |
neg_anch | 1212346/s | 198% | 79% | 7% | -- | -8% | -10% | -73% |
pos_look | 1312820/s | 223% | 94% | 16% | 8% | -- | -2% | -71% |
pos_anch | 1344229/s | 231% | 99% | 19% | 11% | 2% | -- | -71% |
3calls | 4567476/s | 1024% | 576% | 305% | 277% | 248% | 240% | -- |
For non-matching:
Rate | commit | code | neg_anch | neg_look | pos_look | pos_anch | 3calls | |
---|---|---|---|---|---|---|---|---|
commit | 418623/s | -- | -36% | -65% | -68% | -68% | -70% | -90% |
code | 654350/s | 56% | -- | -45% | -49% | -50% | -54% | -85% |
neg_anch | 1180978/s | 182% | 80% | -- | -9% | -10% | -17% | -73% |
neg_look | 1294154/s | 209% | 98% | 10% | -- | -1% | -9% | -70% |
pos_look | 1308383/s | 213% | 100% | 11% | 1% | -- | -8% | -70% |
pos_anch | 1418108/s | 239% | 117% | 20% | 10% | 8% | -- | -68% |
3calls | 4365151/s | 943% | 567% | 270% | 237% | 234% | 208% | -- |
|
---|