Re^3: Regex match at the beginning or end of string

This is not quite what BrowserUk's regex (or /fred/ && /bill/ for that matter) matches.

The main advantage of the lookaheads over multiple regex is that it extends linearly rather than compounding.

#! perl -slw
use strict;
use List::Util qw[ shuffle ];

my @terms = qw[
    the quick brown fox jumps over the lazy dog
];

my $re = join'', map "(?=^.*$_)", @terms;
$re = qr/$re/;

for( 1 .. 10) {
    my $input = join ' ', shuffle @terms;
    $input =~ $re and print "$input matched";
}

__END__
C:\test>junk48
quick brown the dog over fox lazy jumps the matched
the dog over jumps the quick lazy brown fox matched
jumps brown fox lazy quick the over dog the matched
jumps brown dog over the fox lazy the quick matched
over dog jumps fox the brown the lazy quick matched
dog fox lazy the the quick over brown jumps matched
lazy over brown dog quick the fox jumps the matched
jumps fox quick brown the over lazy dog the matched
over dog the lazy jumps quick brown fox the matched
dog over lazy quick the the jumps brown fox matched
[download]

That is considerably easier than constructing and trying all 350,000+ regex.

Another advantage is that it only takes a small tweak to deal with the situation where not just the ordering is uncertain, but also some terms may be omitted. With the nice side-effect that you can use capturing to find out what was matched because the captures will be returned in a consistent ordering:

#! perl -slw
use strict;
use List::Util qw[ shuffle ];

my @terms = qw[
    the quick brown fox jumps over the lazy dog
];

my $re = join'', map "(?=^.*($_))?", @terms;
$re = qr/$re/;

for( 1 .. 10) {
    my $input = join ' ', (shuffle @terms)[ 1 .. 5 ];
    my @found = $input =~ $re;
    $_ //= 'n/a' for @found;
    print "Found [ @found ]\nin:'$input'";
}

__END__
C:\test>junk48
Found [ the n/a n/a fox n/a over the lazy dog ]
in:'fox over lazy the dog'
Found [ n/a n/a brown fox n/a over n/a lazy dog ]
in:'fox lazy dog brown over'
Found [ the quick brown fox n/a n/a the n/a dog ]
in:'the quick dog fox brown'
Found [ n/a n/a brown fox jumps over n/a lazy n/a ]
in:'brown lazy over jumps fox'
Found [ the quick brown n/a jumps n/a the lazy n/a ]
in:'lazy the quick brown jumps'
Found [ the quick n/a n/a jumps n/a the n/a dog ]
in:'dog quick the the jumps'
Found [ n/a quick brown fox jumps n/a n/a n/a dog ]
in:'fox jumps quick brown dog'
Found [ the n/a brown n/a n/a over the lazy dog ]
in:'over lazy the dog brown'
Found [ the n/a brown fox jumps n/a the n/a dog ]
in:'the brown dog fox jumps'
Found [ the quick n/a fox n/a over the lazy n/a ]
in:'fox lazy over quick the'
[download]

The double matching of 'the' can be a good or bad thing depending upon your purpose.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Comment on Re^3: Regex match at the beginning or end of string Select or Download Code

Replies are listed 'Best First'.
Re^4: Regex match at the beginning or end of string by JavaFan (Canon) on Feb 19, 2011 at 23:18 UTC
Yeah, but instead of writing complicated lookaheads, one may as well write: `{ local $_ = $str; /pattern1/ && /pattern2/; }` [download] which compared to the lookahead variant, has a better chance of being handled by the optimizer. However, neither this, nor the suggested lookahead variant actually trigger only on "pattern1" followed by "pattern2", or "pattern2" followed by "pattern1". Not even if one doesn't insist they are immediately following each other. For instance: `"8" =~ /^(?=.\w)(?=.\d)/;` [download] and `"8" =~ /\w/ && "8" =~ /\d/;` [download] but `"8" !~ /\w\d/ && "8" !~ /\d\w/;` [download]	[reply] [d/l] [select]
Re^5: Regex match at the beginning or end of string by BrowserUk (Patriarch) on Feb 20, 2011 at 01:03 UTC
Yeah, but instead of writing complicated lookaheads, There is nothing complicated about `/(?=.term)`. one may as well write: `/pattern1/ && /pattern2/;`* For two terms maybe. But how about for a variable number of terms? which compared to the lookahead variant, has a better chance of being handled by the optimizer. The optimiser? Care to show some evidence of this "optimiser" in operation? Documentations? Any indication that it actually exists? However, neither this, nor the suggested lookahead variant actually trigger only on "pattern1" followed by "pattern2", or "pattern2" followed by "pattern1" That is only one of the constraints specified, which can be verified using `@+` & `@-` after the match. Have you any solution that satisfies the other two constraints? Namely: without the need to use of multiple comparisons or duplicate usage of either patterns For instance: ... and ... but ... Bad choices do not make for good examples. I offered a solution. Others have offered other solutions. The OP chooses. But, I challenge you to construct the regex(es) required to match the terms: `the`, `quick`, `brown`, `fox`, `jumps`, `over`, `the`, `lazy`, `dog`, in any ordering, using some other technique. Assuming you'll have a solution, we can then compare how the "optimiser" fares with the two variants. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^6: Regex match at the beginning or end of string by JavaFan (Canon) on Feb 20, 2011 at 11:14 UTC
But, I challenge you to construct the regex(es) required to match the terms: the, quick, brown, fox, jumps, over, the, lazy, dog, in any ordering, using some other technique. `my @pats = qw[the quick brown fox jumps over the lazy dog]; my $str = "...."; my $matched = 1; foreach my $pat (@pats) { last unless $matched &&= $str =~ /$pat/; }` [download] Or `my @pats = qw[the quick brown fox jumps over the lazy dog]; my $str = "...."; my $matched = !grep {$str !~ /$_/} @pats;` [download]	[reply] [d/l] [select]
Re^7: Regex match at the beginning or end of string by BrowserUk (Patriarch) on Feb 20, 2011 at 12:49 UTC
Re^8: Regex match at the beginning or end of string by JavaFan (Canon) on Feb 20, 2011 at 15:46 UTC
Some notes below your chosen depth have not been shown here
Re^7: Regex match at the beginning or end of string by BrowserUk (Patriarch) on Feb 20, 2011 at 11:21 UTC
Re^6: Regex match at the beginning or end of string by JavaFan (Canon) on Feb 20, 2011 at 11:22 UTC
That is only one of the constraints specified, which can be verified using @+ & @- after the match. After the match? How? Consider the patterns `/\w` and `\d`, against the strings `&8` and `A8`. After both `"&8" =~ /^(?=.(\w))(?=.(\d))/;` [download] and `"A8" =~ /^(?=.(\w))(?=.(\d))/;` [download] `@-` will be `(0, 1, 1)` and `@+` will be `(0, 2, 2)`, yet one of them fits the criteria from the OP, and the other doesn't.	[reply] [d/l] [select]
Re^7: Regex match at the beginning or end of string by BrowserUk (Patriarch) on Feb 20, 2011 at 12:58 UTC
Re^8: Regex match at the beginning or end of string by JavaFan (Canon) on Feb 20, 2011 at 15:25 UTC
Some notes below your chosen depth have not been shown here
Re^6: Regex match at the beginning or end of string by JavaFan (Canon) on Feb 20, 2011 at 10:46 UTC
Any indication that it actually exists? `man perlre \| grep optimiser`.	[reply] [d/l]