You already have captured the (daily|month|days|...) group as $2, so you can just do something like:
my ($found, $term) = ($1, $2);
However, I question whether your regexp is going to be adequate. Usually "daily" doesn't have anything relevant in front of it, so you'd be capturing, for example, "Here is (my) (daily) string!" (captured groups in (bold parens). And do you really want "(bi)-(month)ly" captured as such? "Every (other) (week)"? "Biweekly" (no hyphen) and "semiweekly" are usually acceptable in English as well.
What I'm trying to say is, parsing periodical time periods in English is hard enough, but pulling them out of sentences will be even harder. My suggestion would be to precisely match the entire periodic period, so you don't pull in extraneous information. Your regexp will not match as often, but some false negatives are likely preferable to inaccurate parsing.
I found a periodic frequency list for you that contains additional terms (some of them archaic and likely not applicable). I suggest more research. Then, I'd start building a regexp something like what I've started below.
Note, of course, that this is only a suggestion of a starting point. What I've come up with is certainly incomplete and needs to be expanded and tested rigorously with a sizable corpus of input strings.
#!/usr/bin/env perl
use 5.010;
use warnings;
use strict;
my $NUMBER = qr/(?i:three|four|five|six|seven|eight|nine|ten|\d+)/;
my $PERIOD = qr/(?i:day|week|month|quarter|year)/;
for ( map { chomp; $_ } <DATA> ) {
say "`$_' contains `$1'" if
/\b
(
(?:bi|semi)? [-]? (?:weekly|monthly)
| (?:every\sother | twice\s)? (?:daily|monthly|quarterly|a
+nnually)
| (?:once|twice|$NUMBER\stimes)\s (?:a|per)\s $PERIOD
| (?:every\s(?:(?:other|twice)\s)?)? $PERIOD
| (?:se|bi)?mestral
)
\b
/xi;
}
__DATA__
Here is the weekly TPS report.
I go for a walk semimonthly.
How often do you clean this toilet? Quarterly?!
The sun comes up seven times per week.
I get older every year.
Not many people say "bimestral" anymore.
Output:
`Here is the weekly TPS report.' contains `weekly'
`I go for a walk semimonthly.' contains `semimonthly'
`How often do you clean this toilet? Quarterly?!' contains `Quarterly'
`The sun comes up seven times per week.' contains `seven times per wee
+k'
`I get older every year.' contains `every year'
`Not many people say "bimestral" anymore.' contains `bimestral'
Only once you are able to extract the entire period would I suggest you then attempt to parse it. (i.e., once you have "bimonthly", further parse or interpret that as you see fit). |