in reply to Re^2: Regex help \b & \Q
in thread Regex help \b & \Q

Hi Anonymous,

\b matches between a \w (in this case "C") and a \W (in this case "+"). If your keywords are always separated by whitespace, something like the following might work. It would be helpful if you could post several example inputs with their expected outputs.

Update: The following does not work correctly if the input string contains multiple instances of $kw separated by a single \s. Thanks to AnomalousMonk for catching that!

my $kw = 'C'; # or use C++ my $title = ".net C .NET Cobol .NET C++ .NET .NETER Perl IT x.NET .net +"; my $count = () = $title =~ m{ (?:^|\s) \Q$kw\E (?:\s|$) }xmsig; print "$count\n"; # prints "1" for both C and C++

Hope this helps,
-- Hauke D

Replies are listed 'Best First'.
Re^4: Regex help \b & \Q
by AnomalousMonk (Archbishop) on Apr 14, 2016 at 11:58 UTC

    The problem with using  (?:^|\s) and  (?:\s|$) as delimiter patterns is that  \s in the middle of a string requires and consumes a whitespace character. If only a single whitespace character separates patterns that are intended to match, some matches will be missed:

    c:\@Work\Perl\monks>perl -wMstrict -le "my $title = 'C C C C++ C++ C++ .NET .NET .NET'; ;; for my $kw (qw(.NET C C++)) { my $count = () = $title =~ m{ (?:^|\s) \Q$kw\E (?:\s|$) }xmsig; print qq{'$kw' $count}; } " '.NET' 2 'C' 2 'C++' 2


    Give a man a fish:  <%-{-{-{-<

Re^4: Regex help \b & \Q
by Anonymous Monk on Apr 14, 2016 at 12:11 UTC
    Thanks For your reply Haukex It works for most cases but failed for the one which was put forward by AnomalousMonk.