This is not a tutorial.
Every time you use $`, $& or $', the entire scalar you are searching is copied.
What's more, not only are the scalars that you process using the regex contain a reference to one of those variables copied, but every scalar, processed by every regex in your entire program also gets copied.
Further, every time you use capturing brackets, all the captured chunks are also copied--again.
And, even correctly written regexes that use two or more variable length matches (<re>* or <re>+ etc.) can consume prodigious amount of runtime stack and cpu.
Badly and/or naively written regexes that use nested qualifiers can have exponential runtimes, and if the scalar they operate on is anything more than modestly sized, can completely consume your process stack before finally trapping having consumed all your process memory allocation, or system swap space--whichever runs out first.
Dooom, gloom, despondency.
More doom gloom and despondency.
Blah, blah, blah.
Oh. and here is a solution that prevents some of the problems by wrapping each call to the regex engine.
It starts anothor process, sends your scalars and the regex to it via sockets. That other process runs the regex on your behalf, and sends the results back via another socket. This neatly eliminates the $& problem, and allows recovery from the stack runaway/memory exhaustion problems whilst keeping your main process' memory requirements to a minimum.
Whilst much of the above is and has been true for the past 5 (8?, 10?) years, most of it could not be otherwise.
And the point is that the regex engine isn't lightweight, and has some vagaries and caveats,
but that hasn't prevented thousands of programmers from writing 100s of thousands of perfectly functional, useful, beneficial scripts that use Perl's regex engine
Note:The stack problem has been very cleverly fixed in a recent build,
|
|---|