I'm having difficulties counting the number of matches to some regular expressions. I know you can put the regex in list context, and then convert the list to a scalar (http://stackoverflow.com/questions/1849329/is-there-a-perl-shortcut-to-count-the-number-of-matches-in-a-string), for example using the goatse operator, =()=, but this doesn't seem to be working with my particular regular expression.

In the example below, I'm searching a string to see if either revenue(s), sales or growth occur within three words of the word currency or the phrase "foreign exchange." I cannibalized this regex from this website giving an example of implementing "near" in perl: http://www.regular-expressions.info/near.html.

The problem that I'm running into is that I cannot for the life of me accurately count the number of matches of my regex. For example, when I test a text file containing only the words

foreign exchange revenue currency revenue

I find EIGHT matches. My own intuition and a test run in RegexBuddy show only TWO matches. I don't get any errors from Perl. But when I output my matches to a list and print them, these are the "matches" I get: (each match is in between *'s)

1 **
2 **
3 *foreign exchange*
4 *revenue*
5 **
6 **
7 *currency*
8 *revenue*

I'm getting several empty matches, and then some other matches that don't even match the whole phrase that should be matched. I can count simple regexes just fine, but somehow my convoluted "near" expression is messing things up. I keep trying to fiddle with the regex, but nothing I've tried has worked. I am willing to admit that I am not an expert programmer, and this is beyond my abilities at this point.

use strict; use warnings; my $FX_growth; if ($text=~/\b(?:(revenues?|sales|growth)\W+(?:\w+\W+){0,4}?(curre +ncy|foreign\Wexchange)|(currency|foreign\Wexchange)\W+(?:\w+\W+){0,4} +?(revenues?|sales|growth))\b/i) { $FX_growth =()= $text =~ /\b(?:(revenues?|sales|growth)\W+ +(?:\w+\W+){0,4}?(currency|foreign\Wexchange)|(currency|foreign\Wexcha +nge)\W+(?:\w+\W+){0,4}?(revenues?|sales|growth))\b/gi; } else { $FX_growth=0; }

In reply to Problems counting regex matches by elcilorien

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.