jeteve has asked for the wisdom of the Perl Monks concerning the following question:

Hi Fellow monks,

What is according to you the fastest regex engine ? Our beloved perl regex engine or pcre ?

Do you have any pointers on benchmarks ?

Cheers !

Jerome

Replies are listed 'Best First'.
Re: pcre vs perl regex engine
by zentara (Cardinal) on Jan 16, 2009 at 19:24 UTC
    See Analyzing regular expression performance . I can't remember where I read it, but Perl regexes are often faster than c, because of various reasons......like the Perl engine is better developed, it was designed for parsing text, etc. I'm not saying always, but often Perl is supposedly faster.

    Even the pcre man page points to times when it is slow, due to idiosyncracies of c.


    I'm not really a human, but I play one on earth Remember How Lucky You Are
      I can't remember where I read it, but Perl regexes are often faster than c
      What do you mean by that? Perls regexp engine is written in C. Is it faster than itself?

      Perls regexp engine is a (modified) NFA. It's not hard to come up with matches are screaming fast on a DFA, and takes a long time on an NFA. But an NFA allows one to do things you cannot do with a DFA (writing a regexp matching balanced parenthesis for instance).

      like the Perl engine is better developed, it was designed for parsing text,
      And what do you mean by that? Do you really think Larry was the only person who said "I got a brilliant idea for a regexp engine. Instead of designing it to do laundry, I will design it to parse text"?
        Well JavaFan you are too smart for me to argue with, but googling for "pcre regex speed" comes up with complaints that libpcre's utf-8 regexing is very slow, that some globals being used are not thread-safe and may cause memory gains and possible crashes in threads, etc.

        Also I would point out that even though the perl regex engine is written in C, it is not the same C code that the pcre lib is made from. So I'm not saying that C is faster than C, I'm saying the C code in Perl's regex engine may run faster for many regexes than the C code in the pcre lib; and that may be due to the difficulty of setting up the rest of the C program to run the regexes in the most efficient manner. But isn't that what is all about? You can do the match faster in Perl, because the regex is often a 1 liner, dosn't programmer time count in this? jeteve didn't specifically say only machine-time speed, although that was probably his intention.

        I googled, and havn't found real benchmark comparing the 2 engines with a good set of regex stress tests, maybe you could use your knowledge to setup a benchmark, and post it.


        I'm not really a human, but I play one on earth Remember How Lucky You Are
Re: pcre vs perl regex engine
by mr_mischief (Monsignor) on Jan 16, 2009 at 19:34 UTC
    PCRE and especially the regex engines for strictly POSIX-compliant regexes don't do nearly as much as the Perl regex engine. Do you mean you want to know which is fastest within the narrow confines of what they all can do?
Re: pcre vs perl regex engine
by Joost (Canon) on Jan 16, 2009 at 20:28 UTC
Re: pcre vs perl regex engine
by jettero (Monsignor) on Jan 16, 2009 at 19:14 UTC
    I'd like to see some numbers on that and on the POSIX regex engine. But any benchmarks would need a huge variety of patterns as, no doubt, some of the engines are designed with different purposes in mind than others.

    -Paul

      Yep,

      My purpose here is to keep things simple. No conditions, backtracking etc..

      J.
        Then it's not much of a test. How a RE engine handles backtracking should at the very least be relevant.

        -Paul

Re: pcre vs perl regex engine
by thunders (Priest) on Jan 16, 2009 at 20:12 UTC

    How exactly would you go about benchmarking this? It's my understanding that due to a bunch of perl specific features, you can't easily embed perl's regex engine alone in a C program( you could of course embed an entire perl interpreter). And obviously it wouldn't make much sense to embed pcre in a perl program. So I can't think of a way to do an apples to apples comparison.

    You could time a series of perl and c programs that run a variety of regexes over various types of input, for a least common denominator feature set. But for most programs there are a number of other factors that will impact performance more than the choice of regex engine (interpreted vs compiled, i/o libraries, memory management, GC, etc)

    But beyond writing programs that only do regex matching, I think the choice between pcre and perl is whether you want to write the rest of the program in C/C++ or in Perl.