in reply to Regex help \b & \Q

Per choroba's suggestion, here's this:

c:\@Work\Perl\monks>perl -wMstrict -le "my $title = 'C .NET Cobol .NET .NET .NETER Perl xC x.NET'; ;; for my $kw (qw(.NET C C++)) { my $count = () = $title =~ m{ (?<! \S) \Q$kw\E }xmsig; print qq{'$kw' $count}; } " '.NET' 4 'C' 2 'C++' 0
(but I get 4 for '.NET'; I don't see how you would get three without another look-around or assertion following the  \Q...\E group).

Update: Ok, you seem to have updated your OP. | Oops: Since you posted anonymously, you could not have updated the OP. Anyway... Try this for a  '.NET' count of three:

c:\@Work\Perl\monks>perl -wMstrict -le "my $title = 'C .NET Cobol .NET .NET .NETER Perl xC x.NET'; ;; for my $kw (qw(.NET C C++)) { my $count = () = $title =~ m{ (?<! \S) \Q$kw\E \b }xmsig; print qq{'$kw' $count}; } " '.NET' 3 'C' 1 'C++' 0


Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^2: Regex help \b & \Q
by Anonymous Monk on Apr 14, 2016 at 10:47 UTC

    Thanks AnomalousMonk.
    The second code has worked exactly as I want. Thank you very much.

Re^2: Regex help \b & \Q
by Anonymous Monk on Apr 14, 2016 at 10:53 UTC

    It seems there is a problem.

    when we give $kw = 'C++'; or $kw = 'C'; it fails to show the correct answer.

    Consider the below scenario,

    my $kw = 'C'; # or use C++ my $title = ".net C .NET Cobol .NET C++ .NET .NETER Perl IT x.NET .net +"; my $count = () = $title =~ m{ (?<! \S) \Q$kw\E \b }xmsig; print $count; die;

    Here C should have a value of 1 and C++ also should have a value of 1 when checked with the corresponding $kw C and C++ but they are showing wrong answers.

      If whitespace or start/end of string is going to be the delimiter, I would use:

      c:\@Work\Perl\monks>perl -wMstrict -le "my $title = 'C .NET Cobol .NET .NET .NETER Perl C++ C+ xC++ C+++ C++x + xC x.NET .net'; ;; for my $kw (qw(.NET C C++)) { my $count = () = $title =~ m{ (?<! \S) \Q$kw\E (?! \S) }xmsig; print qq{'$kw' $count}; } " '.NET' 4 'C' 1 'C++' 1


      Give a man a fish:  <%-{-{-{-<

        Thanks for this one Anonymous Monk. It is working for me.

        I have another scenario now,

        my $kw = '.Net'; my $title = ".net, .net; C .NET Cobol .NET C++ .NET .NETER Perl IT x.N +ET .net";

        The answer should be 6 in this case. Means it has to consider the coma and semicolon cases too, if a $kw is followed by coma or semicolon, it should count.

      Hi Anonymous,

      \b matches between a \w (in this case "C") and a \W (in this case "+"). If your keywords are always separated by whitespace, something like the following might work. It would be helpful if you could post several example inputs with their expected outputs.

      Update: The following does not work correctly if the input string contains multiple instances of $kw separated by a single \s. Thanks to AnomalousMonk for catching that!

      my $kw = 'C'; # or use C++ my $title = ".net C .NET Cobol .NET C++ .NET .NETER Perl IT x.NET .net +"; my $count = () = $title =~ m{ (?:^|\s) \Q$kw\E (?:\s|$) }xmsig; print "$count\n"; # prints "1" for both C and C++

      Hope this helps,
      -- Hauke D

        The problem with using  (?:^|\s) and  (?:\s|$) as delimiter patterns is that  \s in the middle of a string requires and consumes a whitespace character. If only a single whitespace character separates patterns that are intended to match, some matches will be missed:

        c:\@Work\Perl\monks>perl -wMstrict -le "my $title = 'C C C C++ C++ C++ .NET .NET .NET'; ;; for my $kw (qw(.NET C C++)) { my $count = () = $title =~ m{ (?:^|\s) \Q$kw\E (?:\s|$) }xmsig; print qq{'$kw' $count}; } " '.NET' 2 'C' 2 'C++' 2


        Give a man a fish:  <%-{-{-{-<

        Thanks For your reply Haukex It works for most cases but failed for the one which was put forward by AnomalousMonk.