in reply to Re: Finding pages without specific words
in thread Finding pages without specific words

You are right.

It could be a good idea to quote the search strings in case you ever put something into it with a special meaning in regular expressions.

just use

if ($text !~ m/\Q$nameOne\E/ and $text !~ m/\Q$nameTwo\E/) { print "$name\n"; ++$ct; }


(I know, you don't really need the \E, but IMHO it's nicer with them.)

Replies are listed 'Best First'.
Re: Re: Re: Finding pages without specific words
by UnderMine (Friar) on Mar 08, 2004 at 13:48 UTC
    If speed is important then a more complex single pre-evaluated regex may be faster for you. But it wasn't with this machine
    my $regex=qr{(?:(?:\Q$nameOne\E).*(?:\Q$nameTwo\E)|(?:\Q$nameTwo\E).*( +?:\Q$nameOne\E))}; if ($text !~ $regex) { print "$name\n"; ++ct; }
    An Example benchmark test script (ugly but compares the two methods)
    use Benchmark; $n1='h'; $n2='e'; $r=qr{(?:(?:\Q$n1\E).*(?:\Q$n2\E)|(?:\Q$n2\E).*(?:\Q$n1\E))}; @i=qw(hello how are you doing are you going to exit now? each time?); timethese (10000, { var=> sub { for $w (@i) { $ct1++ if ($w!~$r); } }, and => sub { for $w (@i) { $ct2++ if ($w!~ m/\Q$n1\E/ or $w !~ m/\Q$n2\E/); } } }); print "$ct1, $ct2\n";
    Benchmark: timing 10000 iterations of and, var...
    and: 2 wallclock secs ( 0.73 usr + 0.00 sys = 0.73 CPU) @ 13698.63/s (n=10000)
    var: 1 wallclock secs ( 1.08 usr + 0.00 sys = 1.08 CPU) @ 9259.26/s (n=10000)
    110000, 110000
    Of course this benchmark figure is not based on real data so it is better to use it on the actual data to get a real indication.
    Hope it helps
    UnderMine