rovf has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I thought this would be easy, but somehow I'm stuck: I'm looking for a regexp which matches names, where the left part of the name is, say FOO, and the right part of the name is *not* BAR or BAZ. I guess this is one way to do it, though I'm not sure whether $1 probably will be messed up by the parentheses in the second regexp:

$name =~ /^FOO(.*)$/ && $1 !~ /^(BAR|BAZ)$/
Anyway, I would prefer doing it with a single regexp only. I started with:
$name =~ /^FOO(?!(BAR|BAZ)).*$/
but of course this is not quite right, since it would also reject FOOBARX. I wonder whether there is a simple way to solve this with Perl regexp?

-- 
Ronald Fischer <ynnor@mm.st>

Replies are listed 'Best First'.
Re: Zero-width look-ahead regexp question
by linuxer (Curate) on Jul 01, 2008 at 10:14 UTC

    Did you try to add a word boundary in your regex?

    # untested quickshot $name =~ /^FOO(?!BA[RZ])\b.*$/

    update1: text corrections

    update2: please see Hue-Bonds reply to my post for a corrected version

      I would use ^FOO(?!BA[RZ]\b).*$, ie put the \b inside the look-ahead. Otherwise, only 'FOO' is matched.

      $ perl -le '/^FOO(?!BA[RZ]\b).*$/ and print for qw/FOOBAR FOOBARS FOOB +AZ FOOBAZS FOOQUX FOO/' FOOBARS FOOBAZS FOOQUX FOO

      Update: added example

      --
      David Serrano

        The word boundary does not help, because for instance 'FOOBAR X' and 'FOOBARX' are both strings which should be accepted (because both BAR X and BARX are considered different from <BAR>), while 'FOOBAR' should be rejected.

        Actually I meanwhile believe to having found a solution to my problem while having lunch (it is surprising how a glass of Italian Chardonnay can do good to the brain cells):

        $name =~ /^FOO(?!(BAR|BAZ)$)/
        so unless someone can find a flaw with this (the examples where I tried it, worked so far), I think I will stick with it. It is not a beautiful solution, though, because if I would like one day to pick up what is to the right of FOO, I can't do it. So I'm still interested in hearing alternative proposals.

        Maybe just to make it clear, here are concrete examples: The following strings should be accepted:

        FOOABC FOO_BAR FOOBA FOOBAZZZZ FOOBAR BAZ
        while the following should be rejected:
        FOOBAR FOOBAZ BARBAZ
        -- 
        Ronald Fischer <ynnor@mm.st>
Re: Zero-width look-ahead regexp question
by massa (Hermit) on Jul 01, 2008 at 11:27 UTC
    /^FOO((?!BA[RZ]$).*)/ will do what you want. Reading out loud:
    m{ ^ # start of the string FOO ( # this will become $1 (?! # must not be BA[RZ]$ # BAR or BAZ followed by end-of-string ) .* # now anything else (except "\n") # no need to "$", because ".*" will happily swallow it all ) }x
    Checking it out:
    $ perl -MData::Dumper -le '%h = map { $_ => /^FOO((?!BA[RZ]$).*)/ ? $1 + : q(does NOT match) } qw/FOO FOOZIM FOOBAR FOOA FOOBAZ FOOBARX/; $Da +ta::Dumper::Terse = $Data::Dumper::Indent = 1; print Dumper \%h' { 'FOOBAR' => 'does NOT match', 'FOOZIM' => 'ZIM', 'FOOA' => 'A', 'FOOBARX' => 'BARX', 'FOOBAZ' => 'does NOT match', 'FOO' => '' }
      Thanks a lot, too!!!!
      -- 
      Ronald Fischer <ynnor@mm.st>
Re: Zero-width look-ahead regexp question
by jwkrahn (Abbot) on Jul 01, 2008 at 11:48 UTC
    $ perl -le' for ( qw/ ABCXYZ FOOXXXYZ FOOXXXBAR FOOXXXBAZ / ) { print if /^FOO.*(?<!BAR|BAZ)$/; } ' FOOXXXYZ
      Your answer is more like what rovf says he wants (FOO ... things things ... must not be BA[RZ] ... end) but differ from the regexen he presented initially:
      /^FOO(.*)$/ and $1 !~ /^BA[RZ]$/ mean (FOO ... things things ... end AND "things things" must not be BA[RZ]). The difference is that your answer does not FOOMABAR, and his expression does match it.
      This is a cute idea!!!
      -- 
      Ronald Fischer <ynnor@mm.st>
Re: Zero-width look-ahead regexp question
by Anonymous Monk on Jul 01, 2008 at 14:57 UTC
    Decomposing a regex into component parts makes conceptualization and maintenance easier.

    perl -wMstrict -le "my $beginning = qr{ \A FOO }xms; my $not_ending = qr{ (?! (?: BAR | BAZ) \z ) }xms; my $ending = qr{ .* \z }xms; for my $string (@ARGV) { if ($string =~ m{ $beginning $not_ending ($ending) }xms) { print qq($string accepted ('$1' to right)) } else { print qq($string rejected) } } " FOOABC FOO_BAR FOOBA FOOBAZZZZ "FOOBAR BAZ" FOOBAR FOOBAZ BARBAZ FOOABC accepted ('ABC' to right) FOO_BAR accepted ('_BAR' to right) FOOBA accepted ('BA' to right) FOOBAZZZZ accepted ('BAZZZZ' to right) FOOBAR BAZ accepted ('BAR BAZ' to right) FOOBAR rejected FOOBAZ rejected BARBAZ rejected

    To make your life simpler, avoid as the plague the use of capturing groups in the decomposed regexes (the  qr{ ... } regexes); only use them in the matching regexes (e.g., the  m{ ... }), where it's a lot easier to keep track of them.

      This approach to writing a regexp is a novel idea to me, but I like it a lot. I've spent already too much time deciphering my own complex regexpes I've written some time ago (despite my, probably too lazy, use of comments).

      -- 
      Ronald Fischer <ynnor@mm.st>