elcilorien has asked for the wisdom of the Perl Monks concerning the following question:
I'm having difficulties counting the number of matches to some regular expressions. I know you can put the regex in list context, and then convert the list to a scalar (http://stackoverflow.com/questions/1849329/is-there-a-perl-shortcut-to-count-the-number-of-matches-in-a-string), for example using the goatse operator, =()=, but this doesn't seem to be working with my particular regular expression.
In the example below, I'm searching a string to see if either revenue(s), sales or growth occur within three words of the word currency or the phrase "foreign exchange." I cannibalized this regex from this website giving an example of implementing "near" in perl: http://www.regular-expressions.info/near.html.
The problem that I'm running into is that I cannot for the life of me accurately count the number of matches of my regex. For example, when I test a text file containing only the words
foreign exchange revenue currency revenue
I find EIGHT matches. My own intuition and a test run in RegexBuddy show only TWO matches. I don't get any errors from Perl. But when I output my matches to a list and print them, these are the "matches" I get: (each match is in between *'s)
1 **
2 **
3 *foreign exchange*
4 *revenue*
5 **
6 **
7 *currency*
8 *revenue*
I'm getting several empty matches, and then some other matches that don't even match the whole phrase that should be matched. I can count simple regexes just fine, but somehow my convoluted "near" expression is messing things up. I keep trying to fiddle with the regex, but nothing I've tried has worked. I am willing to admit that I am not an expert programmer, and this is beyond my abilities at this point.
use strict; use warnings; my $FX_growth; if ($text=~/\b(?:(revenues?|sales|growth)\W+(?:\w+\W+){0,4}?(curre +ncy|foreign\Wexchange)|(currency|foreign\Wexchange)\W+(?:\w+\W+){0,4} +?(revenues?|sales|growth))\b/i) { $FX_growth =()= $text =~ /\b(?:(revenues?|sales|growth)\W+ +(?:\w+\W+){0,4}?(currency|foreign\Wexchange)|(currency|foreign\Wexcha +nge)\W+(?:\w+\W+){0,4}?(revenues?|sales|growth))\b/gi; } else { $FX_growth=0; }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Problems counting regex matches
by Eily (Monsignor) on Jan 15, 2014 at 17:51 UTC | |
by AnomalousMonk (Archbishop) on Jan 15, 2014 at 18:50 UTC | |
by AnomalousMonk (Archbishop) on Jan 16, 2014 at 00:05 UTC | |
|
Re: Problems counting regex matches
by AnomalousMonk (Archbishop) on Jan 15, 2014 at 19:39 UTC | |
|
Re: Problems counting regex matches
by InfiniteSilence (Curate) on Jan 15, 2014 at 22:01 UTC |