Re: Re: Finding pages without specific words

You are right.

It could be a good idea to quote the search strings in case you ever put something into it with a special meaning in regular expressions.

just use

if ($text !~ m/\Q$nameOne\E/ and $text !~ m/\Q$nameTwo\E/)
{
    print "$name\n";
    ++$ct;
}
[download]

(I know, you don't really need the \E, but IMHO it's nicer with them.)

Comment on Re: Re: Finding pages without specific words Download Code

Replies are listed 'Best First'.
Re: Re: Re: Finding pages without specific words by UnderMine (Friar) on Mar 08, 2004 at 13:48 UTC
If speed is important then a more complex single pre-evaluated regex may be faster for you. But it wasn't with this machine `my $regex=qr{(?:(?:\Q$nameOne\E).(?:\Q$nameTwo\E)\|(?:\Q$nameTwo\E).( +?:\Q$nameOne\E))}; if ($text !~ $regex) { print "$name\n"; ++ct; }` [download] An Example benchmark test script (ugly but compares the two methods) `use Benchmark; $n1='h'; $n2='e'; $r=qr{(?:(?:\Q$n1\E).(?:\Q$n2\E)\|(?:\Q$n2\E).(?:\Q$n1\E))}; @i=qw(hello how are you doing are you going to exit now? each time?); timethese (10000, { var=> sub { for $w (@i) { $ct1++ if ($w!~$r); } }, and => sub { for $w (@i) { $ct2++ if ($w!~ m/\Q$n1\E/ or $w !~ m/\Q$n2\E/); } } }); print "$ct1, $ct2\n";` [download] Benchmark: timing 10000 iterations of and, var... and: 2 wallclock secs ( 0.73 usr + 0.00 sys = 0.73 CPU) @ 13698.63/s (n=10000) var: 1 wallclock secs ( 1.08 usr + 0.00 sys = 1.08 CPU) @ 9259.26/s (n=10000) 110000, 110000 Of course this benchmark figure is not based on real data so it is better to use it on the actual data to get a real indication. Hope it helps UnderMine	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re: Re: Re: Finding pages without specific words
by UnderMine (Friar) on Mar 08, 2004 at 13:48 UTC

my $regex=qr{(?:(?:\Q$nameOne\E).*(?:\Q$nameTwo\E)|(?:\Q$nameTwo\E).*(
+?:\Q$nameOne\E))};
if ($text !~ $regex) {
   print "$name\n";
   ++ct;
}
[download]

use Benchmark;
$n1='h';
$n2='e';
$r=qr{(?:(?:\Q$n1\E).*(?:\Q$n2\E)|(?:\Q$n2\E).*(?:\Q$n1\E))};

@i=qw(hello how are you doing are you going to exit now? each time?);
timethese (10000, {
var=> sub {
 for $w (@i) {
  $ct1++ if ($w!~$r);
 }
},
and => sub {
 for $w (@i) {
  $ct2++ if ($w!~ m/\Q$n1\E/ or $w !~ m/\Q$n2\E/);
 }
}
});

print "$ct1, $ct2\n";
[download]

[reply]
[d/l]
[select]