sanju87 has asked for the wisdom of the Perl Monks concerning the following question:

Hi am a beginner in Perl.Could you please help me to understand the below line

sentances = split(/(?:(?<=\.|\!|\?)(?<!Mr\.|Dr\.)(?<!U\.S\.A\.)\s+(?=[ +A-Z]))/, $string); for (@sentances) { print $_."\n";

Replies are listed 'Best First'.
Re: Help Me Understand This Regex
by wwe (Friar) on Mar 31, 2012 at 09:54 UTC
    There is a module YAPE::Regex::Explain which can help you to understand regular expressions. Here the output for your expression:
    The regular expression: (?-imsx:(?:(?<=\.|\!|\?)(?<!Mr\.|Dr\.)(?<!U\.S\.A\.)\s+(?=[A-Z]))) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- (?: group, but do not capture: ---------------------------------------------------------------------- (?<= look behind to see if there is: ---------------------------------------------------------------------- \. '.' ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- \! '!' ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- \? '?' ---------------------------------------------------------------------- ) end of look-behind ---------------------------------------------------------------------- (?<! look behind to see if there is not: ---------------------------------------------------------------------- Mr 'Mr' ---------------------------------------------------------------------- \. '.' ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- Dr 'Dr' ---------------------------------------------------------------------- \. '.' ---------------------------------------------------------------------- ) end of look-behind ---------------------------------------------------------------------- (?<! look behind to see if there is not: ---------------------------------------------------------------------- U 'U' ---------------------------------------------------------------------- \. '.' ---------------------------------------------------------------------- S 'S' ---------------------------------------------------------------------- \. '.' ---------------------------------------------------------------------- A 'A' ---------------------------------------------------------------------- \. '.' ---------------------------------------------------------------------- ) end of look-behind ---------------------------------------------------------------------- \s+ whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- (?= look ahead to see if there is: ---------------------------------------------------------------------- [A-Z] any character of: 'A' to 'Z' ---------------------------------------------------------------------- ) end of look-ahead ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
      Thank you...cannot be explained better.. Thanks !!
Re: Help Me Understand This Regex
by Anonymous Monk on Mar 31, 2012 at 08:09 UTC
    It splits a string on spaces, with some lookahead rules.
    my $string = "Hello Mr. Jack! How are you? Everything is OK. That's fi +ne!"; my @sentances = split(/ (?<=\.|\!|\?) # true if in the left side is any of [.!?] # - end of a sentence (?<!Mr\.|Dr\.) # true if in the left side is *NOT* any of ("Mr." | "D +r.") (?<!U\.S\.A\.) # true if in the left side is *NOT* "U.S.A." # - ends with a dot, but is the end of a sentence \s+ # true if in the current possition is space # - space between words (?=[A-Z]) # true if in the right side is a capital A-Z # - start of a new sentence /x, $string); print $_,"\n" for @sentances;
      Very nice explanation and THanks a lot for making me understand the code...
Re: Help Me Understand This Regex
by ww (Archbishop) on Mar 31, 2012 at 10:52 UTC
    You can also use YAPE::Regex::Explain to obtain an answer at your terminal.

    Update: Aargh. Already well answered. Shudda' refreshed before posting.