Re: Regex help \b & \Q
by choroba (Cardinal) on Apr 14, 2016 at 10:19 UTC
|
\Q\b doesn't work, because it returns \\b (you can verify it by print quotemeta '\b'). You can switch the order to make it work: \b\Q$kw\E\b .
There's another problem, though. \b matches word bounderies, but .NET is not considered a word: the dot is not a word character. Therefore, \b doesn't match at the beginning of '.NET'. Use look-around assertions with whitespace (if your words are delimited by whitespace).
($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord
}map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
| [reply] [d/l] [select] |
|
|
Thanks Choroba for the quick reply.
Any work around for this?
I want to match exactly the same $kw in the $title and at the same time want to treat the certain regex characters (like +) as a normal character when matching.
(The case of $kw = 'C++';)
| [reply] |
Re: Regex help \b & \Q
by AnomalousMonk (Archbishop) on Apr 14, 2016 at 10:28 UTC
|
c:\@Work\Perl\monks>perl -wMstrict -le
"my $title = 'C .NET Cobol .NET .NET .NETER Perl xC x.NET';
;;
for my $kw (qw(.NET C C++)) {
my $count = () = $title =~ m{ (?<! \S) \Q$kw\E }xmsig;
print qq{'$kw' $count};
}
"
'.NET' 4
'C' 2
'C++' 0
(but I get 4 for '.NET'; I don't see how you would get three without another look-around or assertion following the \Q...\E group).
Update: Ok, you seem to have updated your OP. | Oops: Since you posted anonymously, you could not have updated the OP. Anyway... Try this for a '.NET' count of three:
c:\@Work\Perl\monks>perl -wMstrict -le
"my $title = 'C .NET Cobol .NET .NET .NETER Perl xC x.NET';
;;
for my $kw (qw(.NET C C++)) {
my $count = () = $title =~ m{ (?<! \S) \Q$kw\E \b }xmsig;
print qq{'$kw' $count};
}
"
'.NET' 3
'C' 1
'C++' 0
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
|
|
| [reply] |
|
|
It seems there is a problem.
when we give $kw = 'C++'; or $kw = 'C'; it fails to show the correct answer.
Consider the below scenario,
my $kw = 'C'; # or use C++
my $title = ".net C .NET Cobol .NET C++ .NET .NETER Perl IT x.NET .net
+";
my $count = () = $title =~ m{ (?<! \S) \Q$kw\E \b }xmsig;
print $count;
die;
Here C should have a value of 1 and C++ also should have a value of 1 when checked with the corresponding $kw C and C++ but they are showing wrong answers. | [reply] [d/l] |
|
|
c:\@Work\Perl\monks>perl -wMstrict -le
"my $title = 'C .NET Cobol .NET .NET .NETER Perl C++ C+ xC++ C+++ C++x
+ xC x.NET .net';
;;
for my $kw (qw(.NET C C++)) {
my $count = () = $title =~ m{ (?<! \S) \Q$kw\E (?! \S) }xmsig;
print qq{'$kw' $count};
}
"
'.NET' 4
'C' 1
'C++' 1
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
|
|
|
|
|
|
|
|
|
Hi Anonymous,
\b matches between a \w (in this case "C") and a \W (in this case "+"). If your keywords are always separated by whitespace, something like the following might work. It would be helpful if you could post several example inputs with their expected outputs.
Update: The following does not work correctly if the input string contains multiple instances of $kw separated by a single \s. Thanks to AnomalousMonk for catching that!
my $kw = 'C'; # or use C++
my $title = ".net C .NET Cobol .NET C++ .NET .NETER Perl IT x.NET .net
+";
my $count = () = $title =~ m{ (?:^|\s) \Q$kw\E (?:\s|$) }xmsig;
print "$count\n"; # prints "1" for both C and C++
Hope this helps, -- Hauke D | [reply] [d/l] [select] |
|
|
|
|
Re: Regex help \b & \Q
by Not_a_Number (Prior) on Apr 14, 2016 at 18:45 UTC
|
I would use a different approach to your task, rather than a complicated regex:
my @wanted = ( 'C', 'C++','.NET', );
my $str = '.net, .net; C# .NET Cobol .NET C++ .NET .NETER c# IT x.NET'
+;
my %split_str;
$split_str{ +uc }++ for split /[ ;,]+/, $str;
say "$_: " . ( $split_str{$_} || 0 ) for @wanted;
That way, if you need to add another item to your search list, there's only one change to be made in one place. For example, if you wanted to add say C# or PL/M, you would just append them to the @wanted array.
Update: Of course, this approach fails for 'Visual Basic', but as far as I can see, so do all the other replies in this thread... | [reply] [d/l] [select] |
|
|
c:\@Work\Perl\monks>perl -wMstrict -le
"my $s = '.net,.Net; .NET;.net, Visual Basic Visual Basic; VisualBasic
+x Visual Basic,Visual Basic';
;;
for my $kw (qw(.NET C C++), 'Visual Basic') {
my $count = () = $s =~ m{ (?: (?<! \S) | (?<= [,;])) \Q$kw\E (?: (?
+! \S) | (?= [,;])) }xmsig;
print qq{'$kw' $count};
}
"
'.NET' 4
'C' 0
'C++' 0
'Visual Basic' 4
But that's not to say your basic approach isn't better!
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
Re: Regex help \b & \Q
by AnomalousMonk (Archbishop) on Apr 14, 2016 at 12:27 UTC
|
c:\@Work\Perl\monks>perl -wMstrict -le
"use Data::Dump qw(dd);
;;
my ($kw) =
map qr{ (?i) (?: $_) }xms,
join q{|},
map quotemeta,
reverse sort
qw(.NET C C++)
;
;;
my $title = 'C xC Cx C C C C++ xC++ C++x C++ C++ C++ .NET x.NET .NETx
+ .NET .NET .NET c c++ .net'
;;
my %count;
$count{ uc $1 }++ while $title =~ m{ (?<! \S) ($kw) (?! \S) }xmsg;
dd \%count;
"
{ ".NET" => 5, "C" => 5, "C++" => 5 }
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
|
|
Hi AnomalousMonk,
Thank you very much for all the efforts.</>
Your code my $count = () = $title =~ m{ (?<! \S) \Q$kw\E (?! \S) }xmsig; works perfectly for me except in a couple of cases(for me).
I have to neglect if the $kw is followed by a coma or a semicolon. Can you pls show me how to add the same to the code?
| [reply] [d/l] |
|
|
| [reply] [d/l] |
|
|
Re: Regex help \b & \Q
by AnomalousMonk (Archbishop) on Apr 14, 2016 at 16:26 UTC
|
Ok, here's yet another fix to my regex. I noticed that '.NET;.NET;' and similar are not counted properly. That's because the (?<! \S) look-behind doesn't allow for a comma or semicolon. Easily fixed:
c:\@Work\Perl\monks>perl -wMstrict -le
"my $s = '.net,.Net; .NET;.net, .NET x.NET .NETx x.NET; x.NET,';
;;
for my $kw (qw(.NET C C++)) {
my $count = () = $s =~ m{ (?: (?<! \S) | (?<= [,;])) \Q$kw\E (?: (?
+! \S) | (?= [,;])) }xmsig;
print qq{'$kw' $count};
}
"
'.NET' 5
'C' 0
'C++' 0
(So... How should C;x C++,x .NET;x etc. be handled?)
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
|
|
Wow, well spotted AnomalousMonk. Actually I had forgotten to include this scenario. Thank you.
regarding C;x C++,x .NET;x
C should be 1
C++ should be 1
.NET also should be 1
Thank You again
| [reply] |