p.s has asked for the wisdom of the Perl Monks concerning the following question:

Hi All, I am having the following problem. I have a 88 $currentLines I want to search for $patterns.

while($currentLine =~ /\s$pattern\s/g){ $count +=1; }


This counts the number of times $pattern occurs in $currentLine except if $pattern is followed by $pattern, for example the following will work:

a b x y i x y p (will find two xy's)

but

abxyxyp (will only find 1 xy!)

Is this normal regular expression behaviour ? and if so, how do I work around it :/ Thank-you :)

P.

Replies are listed 'Best First'.
Re: while =~ consecutive matching problem
by Eimi Metamorphoumai (Deacon) on Nov 12, 2004 at 16:36 UTC
    If I understand correctly, your problem is that you're looking for /\sx y\s/ repeatedly, but your search pattern includes the spaces on either side. So the first match consumes the following space, making it unavailable to be matched as the preceeding space for the next match. If you really need the spaces you could do /(?<=\s)$pattern(?=\s)/ but you might do better with /\b$pattern\b/. Additonally, you can get the count just by doing
    my $count = () = $currentline =~ /\b$pattern\b/g;
    Which evaluates the match in list context, creating a list of all matches. It then assigns that in scalar context (which produces a count of the elements) to $count.

    Although, none of this will match your string of abxyxyp, because it doesn't have any spaces at all, so I suspect you mistyped that.

      I will try this, thank you for such quick responses :) Yes the second line was a typo sorry, it should have spaces. I need to keep the spaces as some categories are more than 1 char long eg. qwe a x o p .Without the spaces "qwe" could be interrpted as "q w e" which I do not want. Thanks again ! :) P.
        You might still look at \b. It's a zero width assertion that there's a word boundary there. Which means it's pretty much exactly like the other solution I gave except for three aspects:
        1. It's based on the difference between \w and \W, not between \s and \S (that is, it will detect a boundary after "qwe" in "qwe-rty")
        2. If your $pattern begins or ends with a non-word character, a \b will assert that it's next to a word character (while the solution based on \s will still assert that there is a space next to it
        3. Finally, and possibly most importantly, it will also match at the begining and end of the string. If you want the other to work there, you'll have to add spaces to the begining and end.
        Probably none of that is all that imporant, but it's often very useful in cases like this.
Re: while =~ consecutive matching problem
by davido (Cardinal) on Nov 12, 2004 at 16:29 UTC

    Show us what is contained in $pattern. If it is just "xy", what you have posted wont find it anyway if xy is embedded between non-space characters. Your existing regex doesnt seem to reflect the behavior you're describing.


    Dave

      Hi, A requirement is that it is preceeded and followed by a space. I have included a script that demo's what I am saying if that helps. Thanks :)

      #!/usr/bin/perl

      $countsAllExample="a b x y i x y p";
      $doesntCountAll="a b x y x y p";
      $pattern = "x y";
      $count=0;

      while($countsAllExample =~ /\s$pattern\s/g){
      $count +=1;
      }

      print "found $pattern $count times in $countsAllExample\n";

      #now error occurs as only 1 "x y" reported
      ###########################################
      $count=0;

      while($doesntCountAll =~ /\s$pattern\s/g){
      $count +=1;
      }

      print "found $pattern $count times in $doesntCountAll\n";

        Ok, take your second test case, the one that doesn't work as you expect, and let's walk through it.

        1. Match "_x_y_" (I used _ to indicate a space character). One is found, and the pattern match position pointer is advanced to the 2nd 'x'.
        2. Look for another "_x_y_", but there isn't one, because you're already at position 'x', not position '_' (space).

        What you may want is something like this:

        m/\s$pattern(?=\s)/

        This won't gobble the second space char.


        Dave

Re: while =~ consecutive matching problem
by Grygonos (Chaplain) on Nov 12, 2004 at 16:33 UTC

    You are searching for spaces around your pattern so yes it should be expected to only match once in the second case. Knowing your exact pattern would be the biggest help in solving this quandry. Also I would reccomend this construct

    foreach(@lines) { while(m{$pattern}g) { $count++; } }
    edit Eimi Metamorphoumai is absolutely right. You need to use non-greedy matching.

Re: while =~ consecutive matching problem
by hostyle (Scribe) on Nov 12, 2004 at 16:32 UTC

    What exactly do you think your \s is doing?

    -- update --

    a b x y i x y p (will find two xy's)

    I don't see any at all unless you meant x y as opposed to xy?

    Reap this node please (is there a special way of requesting NodeReaper to come along?) Reason: alcohol clouded reasoning.

    • Considered by bart: Delete as per the author's request
    • 2004-11-17 Unconsidered by Arunbear: 2+ keep votes prevents reaping; Keep/Edit/Delete: 9/1/25

Re: while =~ consecutive matching problem
by TedPride (Priest) on Nov 12, 2004 at 21:08 UTC
    The problem here is that the word boundary for the first match is used up, and no longer matches for the second match. Something like the following will fix that:
    while ($doesntCountAll =~ /\s$pattern\s/g) { $count++; pos($doesntCountAll) -= 1; }
    However, the method given by Eimi Metamorphoumai is better.