Re^5: Regex match at the beginning or end of string

Yeah, but instead of writing complicated lookaheads,

There is nothing complicated about /(?=.*term).

one may as well write: /pattern1/ && /pattern2/;

For two terms maybe. But how about for a variable number of terms?

which compared to the lookahead variant, has a better chance of being handled by the optimizer.

The optimiser?

Care to show some evidence of this "optimiser" in operation? Documentations? Any indication that it actually exists?

However, neither this, nor the suggested lookahead variant actually trigger only on "pattern1" followed by "pattern2", or "pattern2" followed by "pattern1"

That is only one of the constraints specified, which can be verified using @+ & @- after the match.

Have you any solution that satisfies the other two constraints? Namely:

without the need to use of multiple comparisons or duplicate usage of either patterns

For instance: ... and ... but ...

Bad choices do not make for good examples.

I offered a solution. Others have offered other solutions. The OP chooses.

But, I challenge you to construct the regex(es) required to match the terms: the, quick, brown, fox, jumps, over, the, lazy, dog, in any ordering, using some other technique.

Assuming you'll have a solution, we can then compare how the "optimiser" fares with the two variants.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Comment on Re^5: Regex match at the beginning or end of string Select or Download Code

Replies are listed 'Best First'.
Re^6: Regex match at the beginning or end of string by JavaFan (Canon) on Feb 20, 2011 at 11:14 UTC
But, I challenge you to construct the regex(es) required to match the terms: the, quick, brown, fox, jumps, over, the, lazy, dog, in any ordering, using some other technique. `my @pats = qw[the quick brown fox jumps over the lazy dog]; my $str = "...."; my $matched = 1; foreach my $pat (@pats) { last unless $matched &&= $str =~ /$pat/; }` [download] Or `my @pats = qw[the quick brown fox jumps over the lazy dog]; my $str = "...."; my $matched = !grep {$str !~ /$_/} @pats;` [download]	[reply] [d/l] [select]
Re^7: Regex match at the beginning or end of string by BrowserUk (Patriarch) on Feb 20, 2011 at 12:49 UTC
Okay. So the optimiser favours the lookaheads by a factor of 16x: #! perl -slw use strict; use Benchmark qw[ cmpthese ]; use List::Util qw[ shuffle ]; our @terms = qw[ the quick brown fox jumps over the lazy dog ]; our $re = join'', map "(?=^.$_)", @terms; $re = qr/$re/; our @lines = map join( ' ', shuffle @terms), 1 .. 100; push @lines, ( 'every good boy deserves food' ) x 100; our( $a, $b ) = (0) x 2; cmpthese -1, { a=>q[ /$re/ and ++$a for @lines; ], b=>q[ for my $str ( @lines ) { !grep( $str !~ /$_/, @terms) and ++ +$b; } ], }; print "$a:$b"; __END__ C:\test>junk48 Rate b a b 81.5/s -- -94% a 1399/s 1616% -- 208200:12000 [download] Or, if I don't pre-compile the regex, 30x faster: #! perl -slw use strict; use Benchmark qw[ cmpthese ]; use List::Util qw[ shuffle ]; our @terms = qw[ the quick brown fox jumps over the lazy dog ]; our $re = join'', map "(?=^.$_)", @terms; #$re = qr/$re/; our @lines = map join( ' ', shuffle @terms), 1 .. 100; push @lines, ( 'every good boy deserves food' ) x 100; our( $a, $b ) = (0) x 2; cmpthese -1, { a=>q[ /$re/ && ++$a for @lines; ], b=>q[ for my $str ( @lines ) { !grep( $str !~ /$_/, @terms) && ++$ +b; } ], }; print "$a:$b"; __END__ C:\test>junk48 Rate b a b 82.7/s -- -97% a 2632/s 3082% -- 389700:12000 [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^8: Regex match at the beginning or end of string by JavaFan (Canon) on Feb 20, 2011 at 15:46 UTC
It's faster, but not because the pattern itself is faster. You're getting the benefit because your regexp isn't changing, and you're doing the same pattern tens of thousands of times. The loop I suggested doesn't have that benefit. But it can. use strict; use warnings; use 5.010; use Benchmark qw[ cmpthese ]; use List::Util qw[ shuffle ]; our @terms = qw[ the quick brown fox jumps over the lazy dog ]; our $re = join'', map "(?=^.*$_)", @terms; $re = qr/$re/; our @lines = map join( ' ', shuffle @terms), 1 .. 100; push @lines, ( 'every good boy deserves food' ) x 100; my $line = join '&&', map {"/$_/"} @terms; our( $a, $b, $c ) = (0) x 3; cmpthese -1, { a=>q[ /$re/ and ++$a for @lines; ], b=>q[ for my $str ( @lines ) { !grep( $str !~ /$_/, @terms) and ++ +$b; } ], c=>qq [$line && ++\$c for \@lines;], }; say "$a:$b:$c"; __END__ Rate b a c b 39.8/s -- -95% -97% a 807/s 1927% -- -36% c 1254/s 3051% 55% -- 109400:4800:243000 [download]	[reply] [d/l]
Re^9: Regex match at the beginning or end of string by BrowserUk (Patriarch) on Feb 20, 2011 at 18:45 UTC
Re^10: Regex match at the beginning or end of string by JavaFan (Canon) on Feb 20, 2011 at 22:07 UTC
Re^7: Regex match at the beginning or end of string by BrowserUk (Patriarch) on Feb 20, 2011 at 11:21 UTC
Touché :)++	[reply]
Re^6: Regex match at the beginning or end of string by JavaFan (Canon) on Feb 20, 2011 at 11:22 UTC
That is only one of the constraints specified, which can be verified using @+ & @- after the match. After the match? How? Consider the patterns `/\w` and `\d`, against the strings `&8` and `A8`. After both `"&8" =~ /^(?=.(\w))(?=.(\d))/;` [download] and `"A8" =~ /^(?=.(\w))(?=.(\d))/;` [download] `@-` will be `(0, 1, 1)` and `@+` will be `(0, 2, 2)`, yet one of them fits the criteria from the OP, and the other doesn't.	[reply] [d/l] [select]
Re^7: Regex match at the beginning or end of string by BrowserUk (Patriarch) on Feb 20, 2011 at 12:58 UTC
Like I said, bad choices don't make good examples \w also matches '8'. Correct that and it is easy to see how to determine if the matches were consecutive from `@- * @+` `"&8" =~ /^(?=.([A-Z]))(?=.(\d))/ and print "[@-][@+]";; "A8" =~ /^(?=.([A-Z]))(?=.(\d))/ and print "[@-][@+]";; [0 0 1][0 1 2] "A+8" =~ /^(?=.([A-Z]))(?=.(\d))/ and print "[@-][@+]";; [0 0 2][0 1 3]` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^8: Regex match at the beginning or end of string by JavaFan (Canon) on Feb 20, 2011 at 15:25 UTC
Well, duh. If the patterns cannot overlap, it's easy and your suggestion, or my splitting into different regexes work fine. No checking of @- and @+ is needed. Of course I know \w matches 8. That was the point of the example. Both `/^(?=.\w)(?=.\d)/` and `/\w/ && /\d/` break down on this.	[reply] [d/l] [select]
Re^9: Regex match at the beginning or end of string by BrowserUk (Patriarch) on Feb 20, 2011 at 15:42 UTC
Re^10: Regex match at the beginning or end of string by JavaFan (Canon) on Feb 20, 2011 at 15:55 UTC
Re^6: Regex match at the beginning or end of string by JavaFan (Canon) on Feb 20, 2011 at 10:46 UTC
Any indication that it actually exists? `man perlre \| grep optimiser`.	[reply] [d/l]