Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I've seen some similar questions on this site recently but can't seem to get this to work. I need to match a pattern which looks like \w+\s\w+\s\w+ . However I only want to match if the third word is not to or from e.g.
train times to xyz train times from xyz train times including xyz
So in the above example I would only want to match the last line. The words (train times in my example) are not guaranteed to be at the beginning of a line, however the to/from if they appear in the text would always follow these two words. I tried doing this
if $line =~ /train times ([^to ]|[^from ])+/ { do something with line }
This however still seems to match every line. How do ignore lines that have \w+\s\w+\s to|from in them ?

Replies are listed 'Best First'.
Re: negate pattern match
by GrandFather (Saint) on Jan 31, 2006 at 09:50 UTC

    You need a look ahead assertion. In this case a negative look ahead assertion (?!to\b|from\b):

    use strict; use warnings; while (<DATA>) { print "Matched $_" if /train times (?!to\b|from\b)\w/; } __DATA__ train times to xyz train times from xyz train times including xyz train times tomorrow xyz

    Prints:

    Matched train times including xyz Matched train times tomorrow xyz

    Update: fixed missing \b bug


    DWIM is Perl's answer to Gödel
      Actually this brings me to another question along the same lines which is...if I wanted to (as I do) check for records where there may be a keyword before "train times" that may mean the record is to be rejected i.e. a negative lookbehind assertion I assume
      e.g.
      help train times from xyz load train times including xyz help train times including xyz book train times at 1234
      I need to reject records where the word immediately before "train times" is help. If help is used in the text it would always be the first word in the record. So only record two & four in my example would be printed. I tried doing this
      print if /(?<!help).*train times/; #doesn't work print if /^(!help).*train times/; # doesn't work either
      Based on other comments I've read it seems that the use of negative lookbehind assertion should be discouraged but I'm not sure how to handle this without doing an assertion ?

        A negative look ahead assertion at the start of the line (not a look back) is what you want:

        use strict; use warnings; while (<DATA>) { print "Matched: $_" if /^(?!help).*?train times (?!to\b|from\b)\w/; } __DATA__ help train times from xyz load train times including xyz help train times including xyz book train times at 1234

        Prints:

        Matched: load train times including xyz Matched: book train times at 1234

        DWIM is Perl's answer to Gödel
      One happy customer, thank you
Re: negate pattern match
by tirwhan (Abbot) on Jan 31, 2006 at 09:56 UTC

    The square brackets ([]) delimit a character class, which essentially means "match any of these characters". So in your example, the regex ([^to ]|[^from ]) would match any character that is not (because of the ^ character which negates the character class) one of the characters f,m,o,r,t or a space.

    You can use simple alternation to match either of these words:

    if $line !~ /train times (?:to|from)\b/ { # do something with line }

    Update: added word boundary \b, thanks to GrandFather for pointing this out :-)


    There are ten types of people: those that understand binary and those that don't.

      Try it with the string "train times tomorrow xyz" :)


      DWIM is Perl's answer to Gödel
Re: negate pattern match
by olus (Curate) on Jan 31, 2006 at 18:13 UTC
    Take a look at the following code sample.
    From what I understood, only str2, str3 and str6 should match.
    About the regexp:
    - Note I am not using =~. I am saying 'if it does not match'
    - Search for two words followed by either 'to' or 'from' but make sure it is not followed by a word char or it is the end of string.
    #!/usr/bin/perl -w my $str1 = "train times tomorrow xyz"; my $str2 = "train times to xyz"; my $str3 = "train times from xyz"; my $str4 = "train times defrom xyz"; my $str5 = "train times fromar xyz"; my $str6 = "train times from"; my $str7 = "train from somewhere"; if($str1 !~ /^(\w+)?\s+(\w+)?\s+(to|from)([^\w]|\z)/) { print "does not match. Now we can do something\n"; }