in reply to Match non-capitalized words at the beginning of each sentence
There are lots of ways, not all equivalent. Unicode, for instance, makes the question a lot more complicated, as does locales.
For plain declarative sentences in ASCII, like you are parsing for,
will pick up capitolized initial words for all but the first sentence and sentences starting after a $/ [usu. linebreak]. The first sentence has no preceding period. That also ignores the possibility of text with dialogue, interrogation points, exclamation points, ellipsis, ... There is a CPAN module, Lingua::EN::Sentence, which may be useful to you.while(<>){ if(/\.\s+([A-Z]+)/){ print "$1 is capitalized\n"; } }
The perl functions uc, lc, ucfirst, and lcfirst are handy for this kind of comparison. Accepting the sentence matching of your example for simplicity,
That re picks the first word after a period and whitespace.if ( /\.\s+(\w+)/ and $1 eq ucfirst($1) ) { print "That'un's Ok", $/; }
To solve the sentence after linebreak problem, you can either slurp the entire file by local $/=undef; and m//s, or else match terminal periods and keep state in some variable for the next line.
After Compline,
Zaxo
|
|---|