in reply to Re^3: Regex help \b & \Q
in thread Regex help \b & \Q

Thanks for this one Anonymous Monk. It is working for me.

I have another scenario now,

my $kw = '.Net'; my $title = ".net, .net; C .NET Cobol .NET C++ .NET .NETER Perl IT x.N +ET .net";

The answer should be 6 in this case. Means it has to consider the coma and semicolon cases too, if a $kw is followed by coma or semicolon, it should count.

Replies are listed 'Best First'.
Re^5: Regex help \b & \Q
by ikegami (Patriarch) on Apr 14, 2016 at 18:29 UTC

    Better:

    my $count = () = $title =~ m{ (?:^|\s)\K \Q$kw\E (?! [^\s,;] ) }xig;
    • (?! [^\s,;] ) is more efficient than (?: (?! \S) | (?= [,;]).
    • (?:^|\s)\K is more efficient than (?<! \S ).
    • The s and m flags weren't necessary.
      (?:^|\s)\K is more efficient than (?<! \S )

      But it does not meet the requirements of the latest update (of the latest update (of the latest update (of the latest update...))) of the "specification" in the OP. However, it's easily fixed:
          (?: ^ | [\s,;]) \K
      and I'm happy to accept that it's more efficient.

      Update: In fact,  (?<! [^\s,;]) works just as well as  (?: ^ | [\s,;]) \K and has a certain orthogonality. It's still double-negatory, though. I've no idea about efficiency.

      BTW: It should be noted that  \K is only available from Perl version 5.10 onward.

      ◾The s and m flags weren't necessary

      In order to limit the "degrees of freedom" of (and the necessity for thought about) the  . ^ $ operators and for readability, I always use  /xms in every regex.


      Give a man a fish:  <%-{-{-{-<

      (?! [^\s,;] )
      Sometimes double negation is difficult to understand, so someone would like to read:
      (?= [\s,;] | \z)
        Sometimes double negation is difficult to understand ...

        More than just sometimes, IMHO, but it's tolerable if taken in moderation. E.g., if you need a "digit boundary" assertion analogous to  \b in that it also matches at the start/end of a string, then  (?<! \d) and  (?! \d) are very attractive. Then  (?<! \d) \d{4} (?! \d) matches  '1234' 'x1234' '1234x' 'x1234x' but none of  '12345' 'x12345x' etc. Extend this to  (?<! \D) and  (?! \D) and you have a sometimes-useful double-negation asserting "non-digit boundary".


        Give a man a fish:  <%-{-{-{-<

Re^5: Regex help \b & \Q
by AnomalousMonk (Archbishop) on Apr 14, 2016 at 12:43 UTC

    Try this:

    c:\@Work\Perl\monks>perl -wMstrict -le "my $s = '.net, .net; C .NET Cobol .NET C++ .NET .NETER Perl IT x.NET +.net'; ;; for my $kw (qw(.NET C C++)) { my $count = () = $s =~ m{ (?<! \S) \Q$kw\E (?: (?! \S) | (?= [,;])) + }xmsig; print qq{'$kw' $count}; } " '.NET' 6 'C' 1 'C++' 1
    There might be something a bit more elegant than  (?: (?! \S) | (?= [,;])) for the end delimiter, but it was a quick fix.


    Give a man a fish:  <%-{-{-{-<