in reply to Help composing Regex for matching only Titlecase words
In general, it is much easier to write positive regexes than negative ones, so I would use the second approach. I would do something like:
#!/usr/bin/perl use strict; use warnings; my $data = "Antler embedded in mound at South Street, Avebury, Wiltshi +re, England. Comment (lab): Collagen fraction used"; my @result; while ($data =~ /\b([A-Z][a-z]*)/g) { push @result, $1; } print join(' ', @result), "\n";
YAPE::Regex::Explain explains this as
The regular expression: (?-imsx:\b([A-Z][a-z]*)) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- \b the boundary between a word char (\w) and something that is not a word char ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- [A-Z] any character of: 'A' to 'Z' ---------------------------------------------------------------------- [a-z]* any character of: 'a' to 'z' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
See perlretut for more details.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Help composing Regex for matching only Titlecase words
by JavaFan (Canon) on Mar 03, 2011 at 21:44 UTC | |
by kennethk (Abbot) on Mar 03, 2011 at 21:54 UTC | |
by JavaFan (Canon) on Mar 04, 2011 at 08:30 UTC |