siskos1 has asked for the wisdom of the Perl Monks concerning the following question:

hello, is it possible to find the number of all occurences of a target word in a text by regular expression? i mean; aabc ; there are 2 abc. axaxbaxxbc; there are 5 abc. i may be a beginner of perl but i give myself to learn it. any help would be great. thanks in advance
  • Comment on all occurences of a target word by regular expression

Replies are listed 'Best First'.
Re: all occurences of a target word by regular expression
by moritz (Cardinal) on Mar 30, 2010 at 13:46 UTC
    aabc ; there are 2 abc

    Uhm, I see just one abc in aabc

    axaxbaxxbc; there are 5 abc

    Maybe you meant: All matches of a regex of the form a.*b.*c? If yes, feed that regex into Regexp::Exhaustive, and be happy :-)

    Perl 6 - links to (nearly) everything that is Perl 6.
      thanks you understood what i originally meant. i am now trying to understand that module. sorry for the wrong usage of words. i meant combinational way of forming a target word.
        The combinatorial approach has not much to do with perl, but I can try anyway.

        You can for example walk through the string, and for all 'a's count the number of 'b's that are also followed by 'c's, counting their number.

        So for a target string like 'a b  a c b c c' you find that for the first 'a' you have 1 b followed by 3 c's, and one b followed by 2 c's, which sums up to 5.

        For the second 'a' you just have one 'b' followed by two 'c's, so all in all you have 7 possible matches.

        Back to Perl, the regex module I mentioned above doesn't really do any work - it just exploits a feature of the perl built-in regex engine. It causes each match to fail, so it forces the regex engine to backtrack into other alternatives.

        This small piece demonstrates that:

        $ perl -wle '"abacbcc" =~ /a.*b.*c(?{ $count++ })(?!)/; print $count' 7

        (?!) is just a "clever" way to write a regex that never matches (on perl 5.10 or newer you can also write (*FAIL) to achieve the same thing, but more readable).

        The (?{...}) is just a block of perl code that regex engine runs after it matched the c but before it failed.

Re: all occurences of a target word by regular expression
by Utilitarian (Vicar) on Mar 30, 2010 at 14:03 UTC
    perl -e '$string="apples and oranges are both fruit but I much prefer +apples"; @count=$string=~m/apple/g; print "\nThere are ", scalar @count," occurrences of the word apple in + the string\n";' There are 2 occurrences of the word apple in the string
    Adapt your regex to suit. though I also have a problem seeing how you are count occurrences above

    print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."
      @count=$string=~m/apple/g; ... , scalar @count, ...

      You can get the count directly using this construct.

      $ perl -E ' > $str = q{I like apples, pears and pineapples}; > $count = () = $str =~ m{apple}g; > say $count;' 2 $

      I hope this is of interest.

      Cheers,

      JohnGG