Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^2: pattern matching with large regex

by Anonymous Monk
on Aug 13, 2005 at 21:32 UTC ( [id://483610]=note: print w/replies, xml ) Need Help??


in reply to Re: pattern matching with large regex
in thread pattern matching with large regex

Most of the regex strings are constant, a few hundred may contain simple constructs like alternation and character classes: (f?oo|bar|baz|etc)[\w\-]*\.[0-9]{3,}) We only extract the data if it matches. As many have suggested I benchmarked a typical case with the actual data and unless something is wrong the difference is extreme:
my %cases = ( 'one_large' => sub { if($text=~/(stuff?)m0r3(?:[^:]*\.)?($big_strin +g)/i){my $match="$1:$2"}}, 'many_small' => sub { for(@strings){ if($text=~/(stuff?)m0r3(?:[^:]* +\.)?($_)/i){my $match="$1:$2"}}}, ); print '$text = ', length $text, " characters\n", '$big_string = ', length $big_string, " characters\n", '@strings = ', scalar @strings, " items\n\n"; cmpthese( 0, \%cases);
Results:
$text = 4578 characters $big_string = 210724 characters @strings = 10634 items Rate many_small one_large many_small 1.05/s -- -100% one_large 630/s 60089% -- --

Replies are listed 'Best First'.
Re^3: pattern matching with large regex
by Tanktalus (Canon) on Aug 13, 2005 at 23:23 UTC

    Not having any of the data that you're working with, all I can do is offer suggestions that may or may not help - I can't actually test them out to see that if they don't work, I can keep my mouth shut. ;-)

    So, I'm just curious what happens when you a) use a regexp optimiser from CPAN to "optimise" $big_string (of course, proving that the optimisation didn't break anything would be a bit painful), and b) pre-compile your @strings - e.g.:

    print '$text = ', length $text, " characters\n", '$big_string = ', length $big_string, " characters\n", '@strings = ', scalar @strings, " items\n\n"; my $big_regexp = Regexp::Optimizer->new()->optimize($bit_string); my @small_regexps = map { qr/$_/i } @strings; my %cases = ( 'one_large' => sub { if($text=~/(stuff?)m0r3(?:[^:]*\.)?($big_regex +p)/i){my $match="$1:$2"}}, 'many_small' => sub { for(@small_regexps){ if($text=~/(stuff?)m0r3(? +:[^:]*\.)?($_)/i){my $match="$1:$2"}}}, ); cmpthese( 0, \%cases);
      Pre-compiling @strings had no effect. Inherent laziness prevents me from optimising $big_string since it's plenty fast.
Re^3: pattern matching with large regex
by lidden (Curate) on Aug 13, 2005 at 21:46 UTC
    In your 'one_large' example you get the first match. In 'many_small' you get the last one, try adding a last when you get a match in the for loop and see what happens.
      Nice catch but last won't help here because a match will be the exception. Most of the time we check it all and fail to match, but in production last definitely belongs there.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://483610]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2024-04-19 19:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found