Re: Faster way?

Firstly, your regexp /.*$search_value*/ will not do quite what you think. If $search_value is 'abc', the string "xyzab" will match, since the trailing * is applied to the 'c' in your resulting pattern. The regular expression will also cause your code to die if $search_value has characters that goof up the regexp compilation.

chomp(@file_array);
open(RECENT, ">>$installpath/recent");  # or just >
foreach $filename (@file_array) {
   open(FH, "< $filename") or warn "Could not open $filename: $!";
   while (<FH>) {
      if (/$search_value/o) {
         print RECENT "$search_value=$filename--> $_";
         $numhash{$search_value}++;
         $matched++;
      }
   }
   close(FH);
}
close(RECENT);
[download]

We move the RECENT file stuff outside of your loop, since it makes little sense to keep re-opening the file for every line we want to write. I imagine that's a major source of your speed problems. Since $search_value doesn't change, we optimize the regular expression with the /o switch. If you wanted to forget about subsequent matches in a given file, you could add a last statement inside your loop, which would skip to the next file instead of reading the rest of the current file, but it seems like you're interested in each line that matches.

If your 'recent' file is a temporary/transient thing, used only for processing later in your script, you might also want to consider just storing your matches in an internal data structure, and use them later instead of reading from your file:

push(@{$search_results{$search_value}->{$filename}}, $line);
[download]

Of course, don't underestimate the simplicity of doing this without Perl, if that's all you have to do. The 'grep' command can perform this task natively, unless you need to do some additional processing on the data, and aren't just building a 'recent' file with text matches.

Comment on Re: Faster way? Select or Download Code

Replies are listed 'Best First'.
RE: Re: Faster way? by extremely (Priest) on Oct 07, 2000 at 02:31 UTC
Unless the people using your script can be trusted with learning regexp syntax, you may wish to write that as: `if (/\Q$search_value\E/o) {` with the \Q doing a quotemeta on the string to make sure there aren't "confusing" things in there. Sooner or later some bright-boy will try and search for ".*" for some reason and be lucky enough to have your script return all 350k of data to him... If you are cleaning up and fixing the search pattern elsewhere, ignore this. -- $you = new YOU; honk() if $you->love(perl)	[reply] [d/l]
RE: Re: Faster way? by the_slycer (Chaplain) on Oct 06, 2000 at 22:21 UTC
Actually I want it matching the search value regardless of how (where) it comes up in the line :-). There is a lot more that the prog does, it's not for MY value, this is for a bunch of people that have never used grep. Plus we are using win2k here, this app is faster than the "find" command. One of those issues of people using an app for a while that don't want to see it go. We used to use scripts on DEC/VMS to do this :-)	[reply]
RE: RE: Re: Faster way? by Fastolfe (Vicar) on Oct 06, 2000 at 22:27 UTC
`/$search_pattern/` is unanchored: `"abcdefg" =~ /cd/ # true "abcdefg" =~ /^cd/ # anchored at start, false "abcdefg" =~ /fg$/ # anchored at end, true` [download] The regular expression I provided will match anywhere in the line, unless you've modified that behavior by inserting a `^` or `$` at the beginning of the search string, or the end, respectively.	[reply] [d/l] [select]
RE: RE: RE: Re: Faster way? by the_slycer (Chaplain) on Oct 06, 2000 at 23:37 UTC
Thanks for that :-) The code from above seems to not do much for speed improvement though, obviously cleaner, but not much faster. Damned w2k.. all of these are out on a file server for shared access. I run this same code on my linux box, and way way faster. Too bad :(	[reply]
RE: RE: RE: RE: Re: Faster way? by Fastolfe (Vicar) on Oct 06, 2000 at 23:54 UTC