in reply to Re^2: Regex help \b & \Q
in thread Regex help \b & \Q

If whitespace or start/end of string is going to be the delimiter, I would use:

c:\@Work\Perl\monks>perl -wMstrict -le "my $title = 'C .NET Cobol .NET .NET .NETER Perl C++ C+ xC++ C+++ C++x + xC x.NET .net'; ;; for my $kw (qw(.NET C C++)) { my $count = () = $title =~ m{ (?<! \S) \Q$kw\E (?! \S) }xmsig; print qq{'$kw' $count}; } " '.NET' 4 'C' 1 'C++' 1


Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^4: Regex help \b & \Q
by Anonymous Monk on Apr 14, 2016 at 12:17 UTC

    Thanks for this one Anonymous Monk. It is working for me.

    I have another scenario now,

    my $kw = '.Net'; my $title = ".net, .net; C .NET Cobol .NET C++ .NET .NETER Perl IT x.N +ET .net";

    The answer should be 6 in this case. Means it has to consider the coma and semicolon cases too, if a $kw is followed by coma or semicolon, it should count.

      Better:

      my $count = () = $title =~ m{ (?:^|\s)\K \Q$kw\E (?! [^\s,;] ) }xig;
      • (?! [^\s,;] ) is more efficient than (?: (?! \S) | (?= [,;]).
      • (?:^|\s)\K is more efficient than (?<! \S ).
      • The s and m flags weren't necessary.
        (?:^|\s)\K is more efficient than (?<! \S )

        But it does not meet the requirements of the latest update (of the latest update (of the latest update (of the latest update...))) of the "specification" in the OP. However, it's easily fixed:
            (?: ^ | [\s,;]) \K
        and I'm happy to accept that it's more efficient.

        Update: In fact,  (?<! [^\s,;]) works just as well as  (?: ^ | [\s,;]) \K and has a certain orthogonality. It's still double-negatory, though. I've no idea about efficiency.

        BTW: It should be noted that  \K is only available from Perl version 5.10 onward.

        ◾The s and m flags weren't necessary

        In order to limit the "degrees of freedom" of (and the necessity for thought about) the  . ^ $ operators and for readability, I always use  /xms in every regex.


        Give a man a fish:  <%-{-{-{-<

        (?! [^\s,;] )
        Sometimes double negation is difficult to understand, so someone would like to read:
        (?= [\s,;] | \z)

      Try this:

      c:\@Work\Perl\monks>perl -wMstrict -le "my $s = '.net, .net; C .NET Cobol .NET C++ .NET .NETER Perl IT x.NET +.net'; ;; for my $kw (qw(.NET C C++)) { my $count = () = $s =~ m{ (?<! \S) \Q$kw\E (?: (?! \S) | (?= [,;])) + }xmsig; print qq{'$kw' $count}; } " '.NET' 6 'C' 1 'C++' 1
      There might be something a bit more elegant than  (?: (?! \S) | (?= [,;])) for the end delimiter, but it was a quick fix.


      Give a man a fish:  <%-{-{-{-<