comment on

I'm having difficulties counting the number of matches to some regular expressions. I know you can put the regex in list context, and then convert the list to a scalar (http://stackoverflow.com/questions/1849329/is-there-a-perl-shortcut-to-count-the-number-of-matches-in-a-string), for example using the goatse operator, =()=, but this doesn't seem to be working with my particular regular expression.

In the example below, I'm searching a string to see if either revenue(s), sales or growth occur within three words of the word currency or the phrase "foreign exchange." I cannibalized this regex from this website giving an example of implementing "near" in perl: http://www.regular-expressions.info/near.html.

The problem that I'm running into is that I cannot for the life of me accurately count the number of matches of my regex. For example, when I test a text file containing only the words

foreign exchange revenue
currency revenue
[download]

I find EIGHT matches. My own intuition and a test run in RegexBuddy show only TWO matches. I don't get any errors from Perl. But when I output my matches to a list and print them, these are the "matches" I get: (each match is in between *'s)

1 **
2 **
3 *foreign exchange*
4 *revenue*
5 **
6 **
7 *currency*
8 *revenue*

I'm getting several empty matches, and then some other matches that don't even match the whole phrase that should be matched. I can count simple regexes just fine, but somehow my convoluted "near" expression is messing things up. I keep trying to fiddle with the regex, but nothing I've tried has worked. I am willing to admit that I am not an expert programmer, and this is beyond my abilities at this point.

use strict;
use warnings;

my $FX_growth;

    if ($text=~/\b(?:(revenues?|sales|growth)\W+(?:\w+\W+){0,4}?(curre
+ncy|foreign\Wexchange)|(currency|foreign\Wexchange)\W+(?:\w+\W+){0,4}
+?(revenues?|sales|growth))\b/i)
        {
            $FX_growth =()= $text =~ /\b(?:(revenues?|sales|growth)\W+
+(?:\w+\W+){0,4}?(currency|foreign\Wexchange)|(currency|foreign\Wexcha
+nge)\W+(?:\w+\W+){0,4}?(revenues?|sales|growth))\b/gi;
        } else {
            $FX_growth=0;
            }
[download]

In reply to Problems counting regex matches by elcilorien

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.