in reply to Help composing Regex for matching only Titlecase words

You could try this
my $data = "Antler embedded in mound at South Street, Avebury, Wiltshi +re, England. Comment (lab): Collagen fraction used"; while($data =~ /([A-Z]\S+?)\s/g) { print $1." "; }
Just remember that a lot of this contingent on your data, but that should work for any normal English paragraph with regular spacing rules.

UPDATE: I see a few monks pointing out words with two capital letters as a test case. This code will handle that as well since \S matches any non space character. It'll match "G!#^&*()-+_+234" if you gave it that. I actually tested that it works.

Replies are listed 'Best First'.
Re^2: Help composing Regex for matching only Titlecase words
by kennethk (Abbot) on Mar 03, 2011 at 22:38 UTC
    Unfortunately, yours does not meet spec. The OP specifies a desired output of Antler South Street Avebury Wiltshire England Comment Collagen. Your regex outputs Antler South Street, Avebury, Wiltshire, England. Comment Collagen. It would also miss any trailing words, as in "My name is Mike.". As much as it seems like a common sense term, a 'word' is notoriously elusive from a CS perspective.