in reply to some forking help

Ok I've tried different versions of this program
#!/usr/bin/perl use strict; print get_time() ."\n"; my $count = 0; my $pr_regex= "program.jsp?id=1"; $pr_regex = qr/\Q$pr_regex\E/oi; #open(LOGFILE,"file.txt"); @ARGV = qw(file.txt); close ARGV; #while (<LOGFILE>) { while (<>) { $count ++ if m/$pr_regex/oi; } print qq|$count\n|; print "\n" .get_time() ."\n"; exit; sub get_time { my ($sec,$min,$hour,@junk) = localtime(time); $min = '0' . $min if ($min<10); $sec = '0' . $sec if ($sec<10); return qq|$hour:$min:$sec|; }

and the output is :

bash-2.03$ perl -w agrsel_mark3.cgi
14:27:05
203

14:27:26

so around 20 seconds to find one string. That's after a little tweaking to get it down from 26 seconds.
Here's a version of my original (just looking for one string though):

#!/usr/bin/perl use strict; print get_time() ."\n"; my $count = 0; my $pr_regex= "program.jsp?id=1"; $count = `grep -c '$pr_regex' file.txt`; print qq|$count\n|; print "\n" .get_time() ."\n"; exit; sub get_time { my ($sec,$min,$hour,@junk) = localtime(time); $min = '0' . $min if ($min<10); $sec = '0' . $sec if ($sec<10); return qq|$hour:$min:$sec|; }


and the output:
bash-2.03$ perl -w agrsel_mark4.cgi
14:27:34
203


14:27:40

about 6 seconds.
actually running the full program it takes about 8 seconds a string over the first 68 strings, not quite 9 minutes.
And the regex version takes about 26 minutes to run the first 68 strings.
a little quick math tells me I'm looking at 2 hours versus 6 hours when I start really using the program.
I've tried the reg_ex version a few different ways but the time doesn't get any better then 20 seconds.
Any ideas on how to jump this up a little?
Thanks again
John

Replies are listed 'Best First'.
Re: Re: some forking help
by mstone (Deacon) on Dec 25, 2001 at 21:38 UTC

    Try a test version that looks for more than one string. You'll have to run grep 50 times to find 50 strings, while a regexp loop will search each line for all 50.

    The regexp loop should scale better for large numbers of regexps, too. Iterating a loop and searching for a pattern match are relatively fast, compared to reading information from the disk or spawning a subshell.