yonatan has asked for the wisdom of the Perl Monks concerning the following question:

hello monks,

i would like to have your advise on the following Regex issue. i wish to match the "nor"-s without the preceding "neither"-s in the following.

"Neither Jack nor Peter wants to go to the party next week." - not matched

"this is not a pretty novel, nor is it gentle" - matched

how should i form my regex to match such scenario? thank you.
  • Comment on regex, use of lookaround for whole word filtering

Replies are listed 'Best First'.
Re: regex, use of lookaround for whole word filtering
by ikegami (Patriarch) on Aug 10, 2010 at 21:35 UTC

    Simple:

    !/neither.*nor/ && /nor/

    Complicated (and surely slower):

    /^(?:(?!neither).)*nor/

    Update: Oops, they're not quite equivalent.
    The first will match when no "neither" is found before the last "nor".
    The second will match when no "neither" is found before the first "nor".

      You could probably make the first regex use a non-greedy .*
      I think that's the simpler approach rather than negative look behind and what not. The same can be applied to many other programming logic problems..useful sometimes to think about chunks of code logic or regex patterns as sets in finite space.
      the hardest line to type correctly is: stty erase ^H

        You could probably make the first regex use a non-greedy .*

        Wouldn't help.

        >perl -E"$_='nor neither nor'; say !/neither.*nor/ && /nor/ ?1:0" 0 >perl -E"$_='nor neither nor'; say !/neither.*?nor/ && /nor/ ?1:0" 0

        Non-greediness is very fragile and frequently misused.

Re: regex, use of lookaround for whole word filtering
by dasgar (Priest) on Aug 10, 2010 at 21:27 UTC

    I believe that the following regex should do the job.

    /^(?!.*neither).*nor.*$/mi
      No, it disallows following instances of "neither".
      $ perl -E'say "a nor b neither" =~ /^(?!.*neither).*nor.*$/mi ?1:0' 0

        Nice catch. Didn't even test that kind of scenario.

        Took a few minutes after reading your post to realize why my suggested regex was incorrect. Another example of Perl doing what I told it to do instead of what I wanted it to do. :D

Re: regex, use of lookaround for whole word filtering
by ww (Archbishop) on Aug 11, 2010 at 02:05 UTC
    You have an answer... but had you read On asking for help and How do I post a question effectively?, you would have known what that your question might be ill-received:
    1. The Monastery is NOT a code writing service. Rather, it seeks to be a site to help you learn (cf, "feed a man a fish...." and similar). To that end, "how should I (write some particular code)" is a question that tends to receive disparaging replies... or guides like this on some of the most basic protocols here.
    2. If you pose your question -- with code (and data if relevant) -- to seek assistance on a particular issue that has you stumped, you will have learned something in the course of writing that code and exploring the related documentation and you'll usually find that the Monks will be generous with their time and expertise.

      Thanks to ikegami (!), aquarium & dasgar for politely pointing me in the right direction. Thank you ww (assuming, that is, it is not some automatically generated reply which is triggered by the pattern /how should i/ig ) for warning me from the undesirable consequences my message may inflict (the worst of all is the re-reading the "how to pose a question" tutorial).

      though not a one liner the following is simple to understand and does the trick

      $_='nor neither nor'; my $cnt = 0; while ((/(.*?)nor/ig) ) { # =================== coarse match printf ("\n"); printf (STDOUT qq/\$\`:%s\n/,$`); printf (STDOUT qq/\$\&:%s\n/,$&); printf (STDOUT qq/\$\':%s\n/,$'); if ($1 !~ /\bneither\b/i){ # =========== finer match printf ("match no. %d: ",$cnt++); printf ("matched ok\n"); } else { printf ("match no. %d: ",$cnt++); printf ("no match\n"); }; }
        You probably want \b around "nor" too, lest you get caught snoring on the job.