Zero-width look-ahead regexp question

rovf has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Zero-width look-ahead regexp question by linuxer (Curate) on Jul 01, 2008 at 10:14 UTC
Did you try to add a word boundary in your regex? `# untested quickshot $name =~ /^FOO(?!BA[RZ])\b.*$/` [download] update1: text corrections update2: please see Hue-Bonds reply to my post for a corrected version	[reply] [d/l]
Re^2: Zero-width look-ahead regexp question by Hue-Bond (Priest) on Jul 01, 2008 at 11:05 UTC
I would use `^FOO(?!BA[RZ]\b).$`, ie put the `\b` inside* the look-ahead. Otherwise, only 'FOO' is matched. `$ perl -le '/^FOO(?!BA[RZ]\b).$/ and print for qw/FOOBAR FOOBARS FOOB +AZ FOOBAZS FOOQUX FOO/' FOOBARS FOOBAZS FOOQUX FOO` [download] Update*: added example -- David Serrano	[reply] [d/l] [select]
Re^3: Zero-width look-ahead regexp question by rovf (Priest) on Jul 01, 2008 at 12:10 UTC
The word boundary does not help, because for instance `'FOOBAR X'` and `'FOOBARX'` are both strings which should be accepted (because both `BAR X` and `BARX` are considered different from <BAR>), while `'FOOBAR'` should be rejected. Actually I meanwhile believe to having found a solution to my problem while having lunch (it is surprising how a glass of Italian Chardonnay can do good to the brain cells): `$name =~ /^FOO(?!(BAR\|BAZ)$)/` [download] so unless someone can find a flaw with this (the examples where I tried it, worked so far), I think I will stick with it. It is not a beautiful solution, though, because if I would like one day to pick up what is to the right of FOO, I can't do it. So I'm still interested in hearing alternative proposals. Maybe just to make it clear, here are concrete examples: The following strings should be accepted: `FOOABC FOO_BAR FOOBA FOOBAZZZZ FOOBAR BAZ` [download] while the following should be rejected: `FOOBAR FOOBAZ BARBAZ` [download] -- Ronald Fischer <ynnor@mm.st>	[reply] [d/l] [select]
Re^4: Zero-width look-ahead regexp question by ww (Archbishop) on Jul 02, 2008 at 03:25 UTC
Re^5: Zero-width look-ahead regexp question by rovf (Priest) on Jul 02, 2008 at 07:02 UTC
Re: Zero-width look-ahead regexp question by massa (Hermit) on Jul 01, 2008 at 11:27 UTC
`/^FOO((?!BA[RZ]$).)/` will do what you want. Reading out loud: `m{ ^ # start of the string FOO ( # this will become $1 (?! # must not be BA[RZ]$ # BAR or BAZ followed by end-of-string ) . # now anything else (except "\n") # no need to "$", because "." will happily swallow it all ) }x` [download] Checking it out: `$ perl -MData::Dumper -le '%h = map { $_ => /^FOO((?!BA[RZ]$).)/ ? $1 + : q(does NOT match) } qw/FOO FOOZIM FOOBAR FOOA FOOBAZ FOOBARX/; $Da +ta::Dumper::Terse = $Data::Dumper::Indent = 1; print Dumper \%h' { 'FOOBAR' => 'does NOT match', 'FOOZIM' => 'ZIM', 'FOOA' => 'A', 'FOOBARX' => 'BARX', 'FOOBAZ' => 'does NOT match', 'FOO' => '' }` [download]	[reply] [d/l] [select]
Re^2: Zero-width look-ahead regexp question by rovf (Priest) on Jul 01, 2008 at 12:19 UTC
Thanks a lot, too!!!! -- Ronald Fischer <ynnor@mm.st>	[reply]
Re: Zero-width look-ahead regexp question by jwkrahn (Abbot) on Jul 01, 2008 at 11:48 UTC
`$ perl -le' for ( qw/ ABCXYZ FOOXXXYZ FOOXXXBAR FOOXXXBAZ / ) { print if /^FOO.*(?<!BAR\|BAZ)$/; } ' FOOXXXYZ` [download]	[reply] [d/l]
Re^2: Zero-width look-ahead regexp question by massa (Hermit) on Jul 01, 2008 at 12:30 UTC
Your answer is more like what rovf says he wants (`FOO ... things things ... must not be BA[RZ] ... end`) but differ from the regexen he presented initially: `/^FOO(.*)$/ and $1 !~ /^BA[RZ]$/` mean (`FOO ... things things ... end AND "things things" must not be BA[RZ]`). The difference is that your answer does not FOOMABAR, and his expression does match it.	[reply] [d/l] [select]
Re^2: Zero-width look-ahead regexp question by rovf (Priest) on Jul 01, 2008 at 12:16 UTC
This is a cute idea!!! -- Ronald Fischer <ynnor@mm.st>	[reply]
Re: Zero-width look-ahead regexp question by Anonymous Monk on Jul 01, 2008 at 14:57 UTC
Decomposing a regex into component parts makes conceptualization and maintenance easier. perl -wMstrict -le "my $beginning = qr{ \A FOO }xms; my $not_ending = qr{ (?! (?: BAR \| BAZ) \z ) }xms; my $ending = qr{ .* \z }xms; for my $string (@ARGV) { if ($string =~ m{ $beginning $not_ending ($ending) }xms) { print qq($string accepted ('$1' to right)) } else { print qq($string rejected) } } " FOOABC FOO_BAR FOOBA FOOBAZZZZ "FOOBAR BAZ" FOOBAR FOOBAZ BARBAZ FOOABC accepted ('ABC' to right) FOO_BAR accepted ('_BAR' to right) FOOBA accepted ('BA' to right) FOOBAZZZZ accepted ('BAZZZZ' to right) FOOBAR BAZ accepted ('BAR BAZ' to right) FOOBAR rejected FOOBAZ rejected BARBAZ rejected [download] To make your life simpler, avoid as the plague the use of capturing groups in the decomposed regexes (the `qr{ ... }` regexes); only use them in the matching regexes (e.g., the `m{ ... }`), where it's a lot easier to keep track of them.	[reply] [d/l] [select]
Re^2: Zero-width look-ahead regexp question by rovf (Priest) on Jul 02, 2008 at 06:54 UTC
This approach to writing a regexp is a novel idea to me, but I like it a lot. I've spent already too much time deciphering my own complex regexpes I've written some time ago (despite my, probably too lazy, use of comments). -- Ronald Fischer <ynnor@mm.st>	[reply]