Re: Slow Regex - How to Optimize
by dave_the_m (Monsignor) on Aug 30, 2005 at 21:50 UTC
|
You code causes the regex to be recompiled each time round the inner loop.
A slight rearrangement should improve things:
foreach my $sub ( keys %SUBS ) {
my $re = qr/[^a-zA-Z]$sub[^a-zA-Z]*\(/;
foreach my $line ( @sub_code ) {
if ( $line =~ $re ) { push( @subs, $key ) }
}
}
Dave. | [reply] [d/l] |
|
use re 'debug';
foreach (qw( abc abc def )) {
print("==============================\n");
print("$_\n");
print("\n");
/$_/;
}
Update: Oops, I didn't notice you reversed the loops in addition to moving the regexp.
| [reply] [d/l] [select] |
|
great post.. thanks a lot :) it really solved an issue here. reduced the runtime of a usecase from 30 minutes to ~10 seconds. hats off! :)
| [reply] |
Re: Slow Regex - How to Optimize
by GrandFather (Saint) on Aug 30, 2005 at 23:32 UTC
|
Rate Original dave_the_m rnahi borisz ikegami
+ anon
Original 17.7/s -- -97% -99% -99% -100%
+ -100%
dave_the_m 603/s 3313% -- -76% -82% -87%
+ -87%
rnahi 2502/s 14054% 315% -- -26% -46%
+ -47%
borisz 3392/s 19088% 462% 36% -- -27%
+ -28%
ikegami 4642/s 26158% 669% 86% 37% --
+ -2%
anon 4736/s 26693% 685% 89% 40% 2%
+ --
Perl is Huffman encoded by design.
| [reply] [d/l] [select] |
|
| [reply] |
Re: Slow Regex - How to Optimize
by rnahi (Curate) on Aug 30, 2005 at 21:56 UTC
|
I would use a different approach, i.e. make a single string from
your @sub_code and apply the search for each key just once.
#untested
my $code = join("", @sub_code);
foreach my $sub ( keys %SUBS ) {
while ( $code =~ /\b$sub\b\(/g )
{
push( @subs, $sub ) ;
}
}
Notice that your code has a subtle bug. If the same routine is
used twice in oone line, you'll get it only once. E.g.: sqrt(x) + sqrt(y)
BTW, what is that $key in your code? | [reply] [d/l] [select] |
|
Notice that your code has a subtle bug.
Even if he is running the code through CPP first (to strip out comments, expand macros, rejoin lines), he still has plenty of corner cases to worry about (strings, fucntion pointers, etc)...
#define p printf
int mai\
n/*this is a comment: main()*/(int argc, char **argv)
{
int (*f)(int, char **) = &main;
p("hello world: main()\n");
if(argc>0)
f((argc-1),argv);
}
| [reply] [d/l] |
|
Oh, and don't forget that [^a-zA-Z] matches "(" and that C identifiers can have digits and underscores...
_foo2bar((4),2);
| [reply] [d/l] [select] |
Re: Slow Regex - How to Optimize
by ikegami (Patriarch) on Aug 30, 2005 at 22:10 UTC
|
Maybe using a generic regexp, then checking against %SUBS would be better? That eliminates a nested loop and multiple regexp compilations.
foreach my $line ( @sub_code ) {
if ( $line =~ /(\w+)\(/ ) {
if ( exists $SUBS{$1} ) {
push( @subs, $1 );
}
}
}
Note: I assumed %SUBS keys are subroutine names, not regexps. | [reply] [d/l] |
Re: Slow Regex - How to Optimize
by noslenj123 (Scribe) on Aug 30, 2005 at 22:50 UTC
|
As usual, I didn't explain the concept of what I'm trying to accomplish along with the problem.I gather a list of created subroutines by parsing the .h files. That creates %SUBS. For each $sub in %SUBS I need to gather all the subroutines that the $sub calls, but only if is exists in %SUBS. From all your inputs I have learned how to work with the code as a string and apply the regex in a while loop which is so far much faster. I'm trying something like: while ( $data =~ /\b([a-zA-Z]+)\b\s*\(/g ) {
print "$1\n";
}
Thanks for the direction! I'll post results and look for more advice. | [reply] [d/l] |
Re: Slow Regex - How to Optimize
by borisz (Canon) on Aug 30, 2005 at 22:09 UTC
|
my $str = join '|', sort { length $b <=> length $a } keys %SUBS;
my $re = qr/[^a-zA-Z]($str)[^a-zA-Z]*\(/;
/$re/ and push @subs, $1 for ( @sub_code );
| [reply] [d/l] |
Re: Slow Regex - How to Optimize
by InfiniteSilence (Curate) on Aug 30, 2005 at 22:22 UTC
|
If I am not mistaken, this can be rewritten as:
#!/usr/bin/perl -w
use strict;
my ($wholefile);
my (@keys, %SUBS);
$SUBS{'b'} = 1;
$SUBS{'c'} = 1;
local $/;
open(H,qq|$ARGV[0]|) or die "USAGE: wierd2.pl <filename>"; #sloppy exa
+mple
$wholefile .= <H>;
close(H);
foreach (keys %SUBS) {
while($wholefile=~m/\b$_\b/g){
push @keys, $_;
}
}
1;
Of course, I have no idea how:
You are getting the %SUBS hash populated in the first place. I thought that was the purpose of the script?
Why you would want to do this
Celebrate Intellectual Diversity
| [reply] [d/l] |
Re: Slow Regex - How to Optimize
by noslenj123 (Scribe) on Aug 30, 2005 at 23:05 UTC
|
Okay guys! You rock!I recoded it and now instead of that process taking 64+ seconds, it now take .064 seconds. And I learned some more perl to boot! :-) Tx all! | [reply] |
Re: Slow Regex - How to Optimize
by Anonymous Monk on Aug 30, 2005 at 22:24 UTC
|
I'd try a non-greedy quantifier...
if ( $line =~ /[^a-zA-Z]$sub[^a-zA-Z]*?\(/ ) {
...and maybe a more generic subroutine finder...
foreach my $line ( @sub_code )
{
if ($line =~ /[^a-zA-Z]([a-zA-Z_]+[a-zA-Z_0-9]*)[^a-zA-Z]*\(/
and exists $SUBS{$1})
{ push @subs, $key }
}
| [reply] [d/l] [select] |