Dr Manhattan has asked for the wisdom of the Perl Monks concerning the following question:
Hi Monks
I need a regex to match sentences ending with a period, but it has to miss abbreviations that might occur in the middle of the sentence.
For instance if I have a sentence 'I like Mr. Smith's dog.', the regex should not only match the 'I like Mr.' part.
if ($in =~ /(\w+)(\!|\?|\.)(\s)((([A-Z])(\w|\s|\d|\(|\)|\+|\=|\-|\@| +\#|\%|\&|\*|\<|\>|\,|\\|\/|\"|\`|'n)+(\s)(\w+\.))(\w|\s|\d|\(|\)|\+|\ +=|\-|\@|\#|\%|\&|\*|\<|\>|\,|\\|\/|\"|\`|'n)+(\s)(\w+\.))(\s)([A-Z])/ +) { if (!exists ($abbreviations{$9})) { $hash{$5}++; } elsif (!exists ($abbreviations{$12})) { $hash{$4}++; } }
I tried this, but it still doesn't work.
%abbreviations is a list of known abbreviations.
%hash is where correctly matched sentences are stored
Any help would be appreciated
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Regex matching end of sentence
by tmharish (Friar) on Jan 31, 2013 at 09:34 UTC | |
|
Re: Regex matching end of sentence
by Anonymous Monk on Jan 31, 2013 at 10:17 UTC | |
|
Re: Regex matching end of sentence
by ww (Archbishop) on Jan 31, 2013 at 18:20 UTC |