Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hope this should have been asked before, but I am not able to search what I was looking for. Here I am reading from a data file line by line and process each line.

my $Line = "1243 That will efficiently match a nonempty group with mat +ching parentheses two levels deep or less."; if( $Line =~ /^(\d{4}) ([^\.]+?[\.\!\?])$/ ) { print "$1\n$2\n"; #Do something } elseif( #do some other check }

In the above example I would like to skip the line if it contains a word “group”. This word can occur anywhere in that line. How do I achieve in a single regex? Currently I save $2 in a variable and check for word “group”. This makes my life difficult as I have to continue the elsif loot if “group” present. Appreciate your help.

Replies are listed 'Best First'.
Re: Excluding Words in RegEx
by ikegami (Patriarch) on Nov 06, 2011 at 05:03 UTC

    How do I achieve in a single regex?

    Why?

    if ($Line !~ /\bgroup\b/ && $Line =~ /^(\d{4}) ([^\.]+?[\.\!\?])$/)

    But it can be done.

    if ($Line =~ /^(?!.*\bgroup\b)(\d{4}) ([^\.]+?[\.\!\?])$/)
      should we need to escape . ?

      /^(?!.*\bgroup\b)(\d{4}) (^.+?.\!\?)$/

      is good enough? \. and . does the same thing? any takers?

        Good point that within a character class '.' does not need to be escaped with a backslash. However, you need to use <code></code> tags around it in your post so the [] aren't rendered as links.

        I just copied the OP's pattern. But yeah,
        /^(\d{4}) ([^\.]+?[\.\!\?])$/

        can be shortened to

        /^(\d{4}) ([^.]+?[.!?])$/
Re: Excluding Words in RegEx
by CountZero (Bishop) on Nov 06, 2011 at 08:15 UTC
    I would like to skip the line if it contains a word “group”. This word can occur anywhere in that line. How do I achieve in a single regex?
    use Modern::Perl; while (<DATA>){ print $_ unless /\bgroup\b/; } __DATA__ This line is OK This line should be skipped: 'group' Drop this group too But keep this grouped content
    This is so simple, I think there must be more than your simple requirement.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      Well, This is so simple, I think there must be more than your simple requirement.
      I think so too!
      I don't see any need to use Modern::Perl. I figure old fashioned Perl will work just fine.
      #!/usr/bin/perl -w use strict; while (<DATA>) { print unless /\bgroup\b/; } =prints: This line is OK But keep this grouped content =cut __DATA__ This line is OK This line should be skipped: 'group' Drop this group too But keep this grouped content
        I always start all my scripts with use Modern::Perl. It is less typing than use strict; use warnings; and it switches on all the "modern" features too.

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      In the OP an example line starting with a four digit number is given. The OP gives a regex for capturing the number and the rest of the line separately but is asking how to skip lines containing 'group' at the same time. ikegami gave a good response.

        That is exactly why I said that the requirement could not have been that simple. Another example of an "XY-problem"!

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        It is not that simple I suppose. But ikegami code works great.

        my $Line = "1243 That will efficiently match a nonempty group with mat +ching parentheses two levels deep or less."; if ($Line =~ /^(?!.*\bgroup\b)(\d{4}) ([^.]+?[.\!\?])$/){ print "$1\n$2\n"; #Does not print } elsif ($Line =~ /^(?!.*\bgXroup\b)(\d{4}) ([^.]+?[.\!\?])$/){ print "$1\n$2\n"; #Prints }

        Though it solves my original query just wondering if it is possible to check the word in the second group of (). Thanks ramprasad27 for the "\." point. I didnt know that earlier.

        -Dominic