in reply to Re: Comparing pattern
in thread Comparing pattern

@bv
Look at study, especially if you have a lot of patterns you are matching against.
I have added study; for both sentences, but I can't see any difference when scanning multiple files using this subroutine. Please advice.
#!/usr/bin/perl -w use strict; my $patterns = "/path/to/patterns.txt"; my $arg1 = shift; open (PAT, '<', $patterns) or die "$patterns: $!\n"; my @patterns = <PAT>; study; close(PAT); chomp @patterns; my $regex_string = join '|', @patterns; open( FILE, "<", "$arg1") or die "$arg1: $!\n"; $_ = do { local $/; <FILE> }; study; close(FILE); if ( /($regex_string)/is ) {print "\n$arg1\n$1\n";}

Replies are listed 'Best First'.
Re^3: Comparing pattern
by bv (Friar) on Sep 21, 2009 at 15:22 UTC

    Did you read the documentation on study?

    study attempts to make matches against a string more efficient, but incurs a one-time penalty for the time spent studying the string. It is most beneficial when you are doing many matches against a single string. You should benchmark to determine if you are getting any benefit from study. The first study in your code (line 10) is unnecessary, since you don't have a string in $_ to match against.

    You keep saying "subroutine." Is this really in a sub? If so, are you reading in your patterns every time the sub is run? There's a major inefficiency. And once you solve that one, you can look at precompiling your expressions like I originally suggested.

    print pack("A25",pack("V*",map{1919242272+$_}(34481450,-49737472,6228,0,-285028276,6979,-1380265972)))
      Yes, it is a subroutine. I need this script to scan all files for scams or other abuses. I'm using File::Find to search all files under a directory tree then call the subroutine for each file. Patterns are outside the sub.
      Based on your suggestion, I will precompile this way:
      ... my $list_regex = join '|', @patterns; my $regex_string = qr/$list_regex/is; ... if (/($regex_string)/) {print "\n$arg1\n$1\n";}
      As for study, I noticed a little slowness. Maybe it's not efficient in my case.

      I still have a big problem. Graff helped me with file slurp and scanner working few times faster than my original script, but I don't have experience with $/ or $_ and, if you check my last example, global $1 contains entire text between first pattern and second pattern:
      pattern1 some text pattern2 instead of this match: pattern1.*pattern2
      Can you please give me some advice? Thank you!