I can't remember where I read it, but Perl regexes are often faster than c
What do you mean by that? Perls regexp engine is written in C. Is it faster than itself?
Perls regexp engine is a (modified) NFA. It's not hard to come up with matches are screaming fast on a DFA, and takes a long time on an NFA. But an NFA allows one to do things you cannot do with a DFA (writing a regexp matching balanced parenthesis for instance).
like the Perl engine is better developed, it was designed for parsing text,
And what do you mean by that? Do you really think Larry was the only person who said "I got a brilliant idea for a regexp engine. Instead of designing it to do laundry, I will design it to parse text"?
| [reply] |
Well JavaFan you are too smart for me to argue with, but googling for "pcre regex speed" comes up with complaints that libpcre's utf-8 regexing is very slow, that some globals being used are not thread-safe and may cause memory gains and possible crashes in threads, etc. Also I would point out that even though the perl regex engine is written in C, it is not the same C code that the pcre lib is made from. So I'm not saying that C is faster than C, I'm saying the C code in Perl's regex engine may run faster for many regexes than the C code in the pcre lib; and that may be due to the difficulty of setting up the rest of the C program to run the regexes in the most efficient manner. But isn't that what is all about? You can do the match faster in Perl, because the regex is often a 1 liner, dosn't programmer time count in this? jeteve didn't specifically say only machine-time speed, although that was probably his intention.
I googled, and havn't found real benchmark comparing the 2 engines with a good set of regex stress tests, maybe you could use your knowledge to setup a benchmark, and post it.
| [reply] |
I googled, and haven't found real benchmark comparing the 2 engines with a good set of regex stress tests
I think that, for this kinds of benchmarks, you'd better think of the perl 5.8.x regex engine and the perl 5.10 regex engine, as different engines. So:
comparing the 3 engines
| [reply] |
libpcre's utf-8 regexing is very slow, that some globals being used are not thread-safe and may cause memory gains and possible crashes in threads, etc.
Well, Perl's regexp engine has efficientcy problems with UTF-8 as well (think character classes). Perl (and its regexp engine) have had their share of memory leaks (and I don't think many people are willing to bet they're all gone). And while I'm not aware of any thread related problems with respect to the regexp engine, threads in Perl is such a performance loss; if you build a perl, by default, threads are disabled. And while there may not be many thread issues in the regexp engine due to perl not sharing anything by default between threads, perl's regexp engine isn't reentrant. And it took only 20 years to make it non-recursive (but artifacts of its recursive past still pop up).
Also I would point out that even though the perl regex engine is written in C, it is not the same C code that the pcre lib is made from.
Noone was argueing that both implementations shared code.
I'm saying the C code in Perl's regex engine may run faster for many regexes than the C code in the pcre lib; and that may be due to the difficulty of setting up the rest of the C program to run the regexes in the most efficient manner.
That I don't get. perl (lowercase) is also a C program, and that needs to set up things before it can run the regexp engine as well.
You can do the match faster in Perl, because the regex is often a 1 liner, dosn't programmer time count in this?
No, that would be silly. PCRE is a *library*. Perl is a *programming language*. Of course, if you're going to start from scratch, Perl is going to beat PCRE. In the same way a bicycle is going to beat a V8 engine - you can cycle quite a number of miles before you've build a car around the V8 engine. But once the car is there, it becomes a different matter. With the same reasoning, 99.99% of the matches done will be faster done by humans, because humans can scan it faster in their heads than typing your Perl 1-liner in an editor.
PCRE is meant as a library: to be build into other applications/languages. In such an application or language, it may also be a 1-liner to do the match.
I googled, and havn't found real benchmark comparing the 2 engines with a good set of regex stress tests, maybe you could use your knowledge to setup a benchmark, and post it.
Why should I? I'm not the one who makes unfounded claims about one being faster than the other. But rest assured, if I were to claim Perl was faster than PCRE (or the other way around), I would back it up with a benchmark. But I know enough about Perl regexes to not make any claims compared to PCRE (or any other regexp engine). And I do know enough to be critical about other peoples claims.
So, if you're so sure about Perls superiority, post the benchmark. Make sure you also include the matches were Perl does take a long time. And remember that when Perl appears to be fast in matching/not matching, it's usually because it doesn't need the regexp engine at all - because the optimizer already figured it out.
| [reply] |