comment on

libpcre's utf-8 regexing is very slow, that some globals being used are not thread-safe and may cause memory gains and possible crashes in threads, etc.

Well, Perl's regexp engine has efficientcy problems with UTF-8 as well (think character classes). Perl (and its regexp engine) have had their share of memory leaks (and I don't think many people are willing to bet they're all gone). And while I'm not aware of any thread related problems with respect to the regexp engine, threads in Perl is such a performance loss; if you build a perl, by default, threads are disabled. And while there may not be many thread issues in the regexp engine due to perl not sharing anything by default between threads, perl's regexp engine isn't reentrant. And it took only 20 years to make it non-recursive (but artifacts of its recursive past still pop up).

Also I would point out that even though the perl regex engine is written in C, it is not the same C code that the pcre lib is made from.

Noone was argueing that both implementations shared code.

I'm saying the C code in Perl's regex engine may run faster for many regexes than the C code in the pcre lib; and that may be due to the difficulty of setting up the rest of the C program to run the regexes in the most efficient manner.

That I don't get. perl (lowercase) is also a C program, and that needs to set up things before it can run the regexp engine as well.

You can do the match faster in Perl, because the regex is often a 1 liner, dosn't programmer time count in this?

No, that would be silly. PCRE is a *library*. Perl is a *programming language*. Of course, if you're going to start from scratch, Perl is going to beat PCRE. In the same way a bicycle is going to beat a V8 engine - you can cycle quite a number of miles before you've build a car around the V8 engine. But once the car is there, it becomes a different matter. With the same reasoning, 99.99% of the matches done will be faster done by humans, because humans can scan it faster in their heads than typing your Perl 1-liner in an editor.

PCRE is meant as a library: to be build into other applications/languages. In such an application or language, it may also be a 1-liner to do the match.

I googled, and havn't found real benchmark comparing the 2 engines with a good set of regex stress tests, maybe you could use your knowledge to setup a benchmark, and post it.

Why should I? I'm not the one who makes unfounded claims about one being faster than the other. But rest assured, if I were to claim Perl was faster than PCRE (or the other way around), I would back it up with a benchmark. But I know enough about Perl regexes to not make any claims compared to PCRE (or any other regexp engine). And I do know enough to be critical about other peoples claims.

So, if you're so sure about Perls superiority, post the benchmark. Make sure you also include the matches were Perl does take a long time. And remember that when Perl appears to be fast in matching/not matching, it's usually because it doesn't need the regexp engine at all - because the optimizer already figured it out.

In reply to Re^4: pcre vs perl regex engine by JavaFan
in thread pcre vs perl regex engine by jeteve

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.