Help Me Understand This Regex

Replies are listed 'Best First'.
Re: Help Me Understand This Regex by wwe (Friar) on Mar 31, 2012 at 09:54 UTC
There is a module YAPE::Regex::Explain which can help you to understand regular expressions. Here the output for your expression: The regular expression: (?-imsx:(?:(?<=\.\|\!\|\?)(?<!Mr\.\|Dr\.)(?<!U\.S\.A\.)\s+(?=[A-Z]))) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- (?: group, but do not capture: ---------------------------------------------------------------------- (?<= look behind to see if there is: ---------------------------------------------------------------------- \. '.' ---------------------------------------------------------------------- \| OR ---------------------------------------------------------------------- \! '!' ---------------------------------------------------------------------- \| OR ---------------------------------------------------------------------- \? '?' ---------------------------------------------------------------------- ) end of look-behind ---------------------------------------------------------------------- (?<! look behind to see if there is not: ---------------------------------------------------------------------- Mr 'Mr' ---------------------------------------------------------------------- \. '.' ---------------------------------------------------------------------- \| OR ---------------------------------------------------------------------- Dr 'Dr' ---------------------------------------------------------------------- \. '.' ---------------------------------------------------------------------- ) end of look-behind ---------------------------------------------------------------------- (?<! look behind to see if there is not: ---------------------------------------------------------------------- U 'U' ---------------------------------------------------------------------- \. '.' ---------------------------------------------------------------------- S 'S' ---------------------------------------------------------------------- \. '.' ---------------------------------------------------------------------- A 'A' ---------------------------------------------------------------------- \. '.' ---------------------------------------------------------------------- ) end of look-behind ---------------------------------------------------------------------- \s+ whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- (?= look ahead to see if there is: ---------------------------------------------------------------------- [A-Z] any character of: 'A' to 'Z' ---------------------------------------------------------------------- ) end of look-ahead ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- [download]	[reply] [d/l]
Re^2: Help Me Understand This Regex by sanju87 (Initiate) on Apr 07, 2012 at 08:12 UTC
Thank you...cannot be explained better.. Thanks !!	[reply]
Re: Help Me Understand This Regex by Anonymous Monk on Mar 31, 2012 at 08:09 UTC
It splits a string on spaces, with some lookahead rules. my $string = "Hello Mr. Jack! How are you? Everything is OK. That's fi +ne!"; my @sentances = split(/ (?<=\.\|\!\|\?) # true if in the left side is any of [.!?] # - end of a sentence (?<!Mr\.\|Dr\.) # true if in the left side is NOT any of ("Mr." \| "D +r.") (?<!U\.S\.A\.) # true if in the left side is NOT "U.S.A." # - ends with a dot, but is the end of a sentence \s+ # true if in the current possition is space # - space between words (?=[A-Z]) # true if in the right side is a capital A-Z # - start of a new sentence /x, $string); print $_,"\n" for @sentances; [download]	[reply] [d/l]
Re^2: Help Me Understand This Regex by sanju87 (Initiate) on Apr 07, 2012 at 08:13 UTC
Very nice explanation and THanks a lot for making me understand the code...	[reply]
Re: Help Me Understand This Regex by ww (Archbishop) on Mar 31, 2012 at 10:52 UTC
You can also use YAPE::Regex::Explain to obtain an answer at your terminal. Update: Aargh. Already well answered. Shudda' refreshed before posting.	[reply]

YAPE::Regex::Explain

The regular expression:

(?-imsx:(?:(?<=\.|\!|\?)(?<!Mr\.|Dr\.)(?<!U\.S\.A\.)\s+(?=[A-Z])))

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (?:                      group, but do not capture:
----------------------------------------------------------------------
    (?<=                     look behind to see if there is:
----------------------------------------------------------------------
      \.                       '.'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      \!                       '!'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      \?                       '?'
----------------------------------------------------------------------
    )                        end of look-behind
----------------------------------------------------------------------
    (?<!                     look behind to see if there is not:
----------------------------------------------------------------------
      Mr                       'Mr'
----------------------------------------------------------------------
      \.                       '.'
----------------------------------------------------------------------
     |                        OR
----------------------------------------------------------------------
      Dr                       'Dr'
----------------------------------------------------------------------
      \.                       '.'
----------------------------------------------------------------------
    )                        end of look-behind
----------------------------------------------------------------------
    (?<!                     look behind to see if there is not:
----------------------------------------------------------------------
      U                        'U'
----------------------------------------------------------------------
      \.                       '.'
----------------------------------------------------------------------
      S                        'S'
----------------------------------------------------------------------
      \.                       '.'
----------------------------------------------------------------------
      A                        'A'
----------------------------------------------------------------------
      \.                       '.'
----------------------------------------------------------------------
    )                        end of look-behind
----------------------------------------------------------------------
    \s+                      whitespace (\n, \r, \t, \f, and " ") (1
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    (?=                      look ahead to see if there is:
----------------------------------------------------------------------
      [A-Z]                    any character of: 'A' to 'Z'
----------------------------------------------------------------------
    )                        end of look-ahead
----------------------------------------------------------------------
  )                        end of grouping
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------
[download]

[reply]
[d/l]

Thank you...cannot be explained better.. Thanks !!

[reply]

my $string = "Hello Mr. Jack! How are you? Everything is OK. That's fi
+ne!";

my @sentances = split(/
(?<=\.|\!|\?)   # true if in the left side is any of [.!?]
                # - end of a sentence

(?<!Mr\.|Dr\.)  # true if in the left side is *NOT* any of ("Mr." | "D
+r.")
(?<!U\.S\.A\.)  # true if in the left side is *NOT* "U.S.A."
                # - ends with a dot, but is the end of a sentence

\s+             # true if in the current possition is space
                # - space between words

(?=[A-Z])       # true if in the right side is a capital A-Z
                # - start of a new sentence
/x, $string);

print $_,"\n" for @sentances;
[download]