ManFromNeptune has asked for the wisdom of the Perl Monks concerning the following question:

For performance, I'd like to pre-compile some regular expressions using the qr// syntax, and store them in a database for later use. Sort of like this:
my $regex = "the (animals|dogs|cats).+?are ready to eat."; my $compiled_regex = qr/$regex/; # store $compiled_regex in a database # later on, retrieve the $compiled_regex_from_db my $compiled_regex_from_db = [DBI query and unserialize or thaw?]; my $teststring = "the dogs who are friendly are ready to eat."; $teststring =~ s/$compiled_regex_from_db/creatures/;
Is this possible? Any better ways to store/retrieve compiled regex?

Also is it possible to pre-compile the replacement part of a s/// regular expression? My regex is always search and replace, and the replacement text often has backreferences.

thanks! Nept

Replies are listed 'Best First'.
Re: storing a qr// compiled regexp in a database?
by Corion (Patriarch) on Sep 17, 2005 at 21:05 UTC

    Your "precompilation" won't gain you anything, but it's easily possible:

    my $compiled_regex = qr/$regex/; my $sth = $dbh->prepare("insert into regexes name,value (?,?)"); $sth->execute("ready_to_eat", $compiled_regex); # later my $sth = $dbh->prepare("select value from regexes where name = ?"); my $rc = $sth->execute("ready_to_eat"); my @values = map { $_->value } $sth->fetchrow_array({}); # or somethin +g, look in the DBI docs # convert back to compiled regex for (@values) { $_ = qr/$_/; };
      aren't you just re-compiling the regexp here?
      $_ = qr/$_/;
      My goal was to shave off some performance overhead since I have lots of complex regex patterns that need to run fast.

      I had thought of just keeping the precompiled patterns in memory (like in %precompiled{$regexkey}), but since the patterns come from a database in the first place, then I have potential staleness if the db gets updated since the Perl app started (mod_perl, btw.)

        A "precompiled regex" is mostly a string, at least from the Perl view of things. And the best you could do to save a regex in a database is to stringify it and then recompile it from that string anyway.

        I think you're looking in the completely wrong direction if you're trying to optimize such minor things. Have you profiled your application already? Does it really spend that much time on compiling (and recompiling) regular expressions? Maybe you should reduce the number of regular expressions then. A good profiling tool is Devel::Dprof. It's also the only profiling tool.

Re: storing a qr// compiled regexp in a database?
by Anonymous Monk on Sep 17, 2005 at 21:27 UTC
    You may try to use DB_File for hash with regexes store/retrieve. Maybe it'll gain perfomance.
      The regexps will be stringified (and thus not "compiled") once they are stored into the DB_File database. You cannot store the object as a compiled regexp inside a DB_File or any other on-disk database.
Re: storing a qr// compiled regexp in a database?
by halley (Prior) on Sep 19, 2005 at 16:09 UTC
    What others have said are roughly correct: you can't get the fruits of the internal regex compilation step. Perl will give you the stringified version easily, but that's not what you were hoping for.

    There's one other issue to beware, if you're trying to persist the stringified version of qr// objects: regex creep.

    $regex = qr/(this)is[a]test/i; $regex = "$regex"; $regex = qr/$regex/; # simulate freeze/thaw cycle $regex = "$regex"; $regex = qr/$regex/; # simulate freeze/thaw cycle $regex = "$regex"; $regex = qr/$regex/; # simulate freeze/thaw cycle print "regex: qr{$regex}\n"; __OUTPUT__ regex: qr{(?-xism:(?-xism:(?-xism:(?i-xsm:(this)is[a]test))))}
    The stringifier and the qr// operator don't bother to collapse all of the redundant or unnecessary buildup of those (?-xism:...) wrappers. You could get a massive hundred-layer wrapper if you carelessly freeze and thaw these objects.

    --
    [ e d @ h a l l e y . c c ]

Re: storing a qr// compiled regexp in a database?
by samtregar (Abbot) on Sep 19, 2005 at 19:27 UTC
    Have you compared the time needed to execute a SELECT on your database to the time needed by regex compilation? I think you'd be unlikely to see any benefit even if it was possible!

    -sam