in reply to Finding pages without specific words

You are checking the phrases on a line by line base. Try File::Slurp like so:
use File::Slurp; my $text = read_file( 'filename' ) ;
and use something like:
if (($text !~ m/$nameOne/) && ($text !~ m/$nameTwo)) { print "$name\n"; $ct++; }

Replies are listed 'Best First'.
Re: Re: Finding pages without specific words
by Crian (Curate) on Mar 08, 2004 at 13:00 UTC
    You are right.

    It could be a good idea to quote the search strings in case you ever put something into it with a special meaning in regular expressions.

    just use

    if ($text !~ m/\Q$nameOne\E/ and $text !~ m/\Q$nameTwo\E/) { print "$name\n"; ++$ct; }


    (I know, you don't really need the \E, but IMHO it's nicer with them.)
      If speed is important then a more complex single pre-evaluated regex may be faster for you. But it wasn't with this machine
      my $regex=qr{(?:(?:\Q$nameOne\E).*(?:\Q$nameTwo\E)|(?:\Q$nameTwo\E).*( +?:\Q$nameOne\E))}; if ($text !~ $regex) { print "$name\n"; ++ct; }
      An Example benchmark test script (ugly but compares the two methods)
      use Benchmark; $n1='h'; $n2='e'; $r=qr{(?:(?:\Q$n1\E).*(?:\Q$n2\E)|(?:\Q$n2\E).*(?:\Q$n1\E))}; @i=qw(hello how are you doing are you going to exit now? each time?); timethese (10000, { var=> sub { for $w (@i) { $ct1++ if ($w!~$r); } }, and => sub { for $w (@i) { $ct2++ if ($w!~ m/\Q$n1\E/ or $w !~ m/\Q$n2\E/); } } }); print "$ct1, $ct2\n";
      Benchmark: timing 10000 iterations of and, var...
      and: 2 wallclock secs ( 0.73 usr + 0.00 sys = 0.73 CPU) @ 13698.63/s (n=10000)
      var: 1 wallclock secs ( 1.08 usr + 0.00 sys = 1.08 CPU) @ 9259.26/s (n=10000)
      110000, 110000
      Of course this benchmark figure is not based on real data so it is better to use it on the actual data to get a real indication.
      Hope it helps
      UnderMine