Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Greetings, wise monks;

I have a simple (stupid, if you prefer) question to which an answer cannot be found in the perlre and perlrequick man pages... I'd like to match a certain sentence, exept for one word. I shall give an example to show what I mean:

echo abc nevermind | perl -ne 'print if /abc nevermind/'

What I want, is to match every word _instead_ of 'abc'. /[^(abc)] nevermind/ doesn't group these letters as a word (as I expected), and rather seems to be having no effect at all, checking only if the last letter of the string to search matches any of 'abc'. How do I make it clear to perl that I want to match "not 'abc'" (the word)?

Thanks for any comments.

   wouter

edited: Tue Jul 16 00:28:41 2002 by jeffa - added code tags

Replies are listed 'Best First'.
Re: Negative Look-behind Assertion
by tadman (Prior) on Jul 15, 2002 at 20:22 UTC
    Option A: Remove the word from your sentence:
    s/\babc\s+//;
    Option B: Print only lines without "abc" in them:
    print unless /\babc\b/;
    Note that the use of \b indicates that there should be a "word-boundary" there. This prevents accidental matches of things like "abcd" or "cbabc".

    Option C: Check for something that isn't "abc":
    print if /(\w+) nevermind/ && $1 ne "abc";
    You could get fancy with look-behind assertions or embedded code, if you desire. See: perlre
    print if /(?<!abc) nevermind/;
    I think this last one is what you were trying for.

      I know about look-behind assertions, but I was wondering if there would be a way to do it wihtout... it looks like a common, simple thing, _not_ matching a word...

      But if look-behind assertions are the only solution, then I'll use them.

      I'm not interested in the other solutions, since they require a code change, and... well, it's a bit working around the problem that I wanted to know.

      Thanks!

Re: stupid question about regexp - not matching a word
by VSarkiss (Monsignor) on Jul 15, 2002 at 20:27 UTC

    I'm not sure I'm reading your question right, but you're wondering why [^(abc)] doesn't match the string abc?

    The square brackets enclose an expression that will match one character. The construct you show will match anything other than (, ), a, b, or c. I think that's what your description is trying to say.

    You can get the effect you want in Perl newer than 5.6.1 with a lookbehind assertion: print if /(?<!abc) nevermind/There are other ways to do it, using $PREMATCH, but this will work for fixed strings. More information at perlre, the section about lookbehind.</code>

    That is, if I read your question correctly....

      Yes, that's the solution. Is there no way without look-behind assertions? It seems like a common thing to do, 'not match' a expression of grouped letters (i.e. a word).

      Thanks for your answer :)

        Yes, there's the negative look-a-head assertion. ;-)
        /^(?!abc)\w+ something/

        But why are you afraid of look-behind assertions. Asking for a solution to X without using Y, if Y is a reasonable way of solving X should have an explaination of why Y shouldn't be used.

        Abigail

Re: stupid question about regexp - not matching a word
by thelenm (Vicar) on Jul 15, 2002 at 20:28 UTC
    I'm not exactly sure what you're trying to do... are you trying to match only strings that do not contain "abc"? If so, you may want something like this:
    print unless /\babc\b/;
    Are you trying to match "nevermind" only if not preceded by "abc"? If so, you may want something like this, which uses negative lookbehind:
    print if /(?<!abc )nevermind/;
    Are you trying to match every word except "abc"? If so, you may want something like this:
    my @words = grep !/^abc\z/, split; print "@words";

    -- Mike

    --
    just,my${.02}

      Actually, I was trying to match every sentence where 'abc' is not preceding 'nevermind' - every other word is ok.

      Negative look-behind works, but I expected a more common combination to work too - something like grouping letters with () to a word, and then ^ to negate the sense. Is there no way to group these letters as a word instead of individual characters?

      It's more a matter of curiosity if it can be done with just a simple regexp rather than making it work, otherwise I'd easily solve it with more code.

      Thanks! ;)

        Do you not want to use lookbehind because of efficiency concerns, or some other reason? It seems to be just the right tool for the job... It's doing what you were trying to do in your original code: making sure "abc" does not match before "nevermind".

        As far as using [^abc] to do the job, [] represents a character class within regular expressions, and can only match a single character at a time. So unfortunately, you can't use [] to match anything more than a single character.

        -- Mike

        --
        just,my${.02}

        The Cookbook has the following solution for your problem:
        /^(?:(?!abc).)*$/
        This matches a string that does not contain the text abc.

        Abigail