Things you should need to know before using Perl regexes. (Humour, with a serious point)

Replies are listed 'Best First'.
Re: Things you should need to know before using Perl regexes. (Humour, with a serious point) by Corion (Patriarch) on Oct 25, 2006 at 09:18 UTC
Further, every time you use capturing brackets, all the captured chunks are also copied--again. Not exactly `;)` `Q:\>perl -le "($x = 'foo') =~ /.(.)/g; print $1; $x = 'bar'; print $1" o a This is perl, v5.8.2 built for MSWin32-x86-multi-thread` [download] Of course, the `/g` modifier there is a bug in the code, but it still shows that not in every case, a capturing match copies the buffer. Thanks to dave_the_m and demerphq's recent work, the Perl5.10 regex engine is improving even more beyond the C recursion elimination. It has named captures that bring it up to par and beyond what the other named captures provide, and it has quite the speedup against Unicode strings as far as I understand the changes. There are some deeper problems with how closures-in-regular expressions are handled vs. interpolation (in `(?{..})` blocks). This post is mostly about adding some perspective to the changes that happen to the regex engine ;)	[reply] [d/l] [select]
Re^2: Things you should need to know before using Perl regexes. (Humour, with a serious point) by tinita (Parson) on Oct 25, 2006 at 12:37 UTC
`Q:\>perl -le "($x = 'foo') =~ /.(.)/g; print $1; $x = 'bar'; print $1" o a` [download] wow. didn't expect that. betterworld and i just tried out some other examples, and it seems that the string buffer is not really emptied. `$ perl -MData::Dumper -we'$Data::Dumper::Useqq = 1; ($x = "fou") =~ /.(..)/g; print Dumper $1; $x = "b"; print Dumper $1; ' $VAR1 = "ou"; $VAR1 = "\0u"; $ perl -MData::Dumper -we'$Data::Dumper::Useqq = 1; ($x = "fou") =~ /.(..)/g; print Dumper $1; $x = "ba"; print Dumper $1; ' $VAR1 = "ou"; $VAR1 = "a\0";` [download] so `$1` just outputs the second and third character of the string, and in the first example you see the remains of the string 'fou'	[reply] [d/l] [select]
Re: Things you should need to know before using Perl regexes. (Humour, with a serious point) by liz (Monsignor) on Oct 25, 2006 at 10:06 UTC
;-) I guess Persiflage is the sincerest form of flattery. Liz	[reply]
Re^2: Things you should need to know before using Perl regexes. (Humour, with a serious point) by gaal (Parson) on Oct 25, 2006 at 17:55 UTC
Only be sure always to call it please, "research".	[reply]
Re: Things you should need to know before using Perl regexes. (Humour, with a serious point) by bennymack (Pilgrim) on Oct 25, 2006 at 12:15 UTC
I'm curious what you were actually trying to link to in: Oh. and here is a solution that prevents some of the problems by wrapping each call to the regex engine. Because it just does a search for the word "here" AFAICT	[reply]
Re^2: Things you should need to know before using Perl regexes. (Humour, with a serious point) by BrowserUk (Patriarch) on Oct 25, 2006 at 13:16 UTC
It's a placeholder for a link to the module described for when I get around to writing it. (You did notice the word "Humour" in the title didn't you :) Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re: Things you should need to know before using Perl regexes. (Humour, with a serious point) by Anonymous Monk on Oct 25, 2006 at 19:41 UTC
Your title to the contrary, I find it hard to see much humor in this post. I read it as more of your relentless bitterness at people who talk down Perl's threads. I think I understand your view -- that threads are good (the future, as you put it) and Perl's threads won't get better unless people use them. But ignoring the reality of Perl's threads is not going to help, either. Also, lest you accuse me of ignoring the superficial "point" of your post, the regex engine is indeed a monster. The amount of work being put into it in bleadperl should be a good indicator of that. I am not going to argue with you on this point, but that is unrelated to the threads issues.	[reply]
How could it not be a monster? by demerphq (Chancellor) on Oct 26, 2006 at 08:57 UTC
the regex engine is indeed a monster. The amount of work being put into it in bleadperl should be a good indicator of that. I agree with the first point. I'm not so sure about the second point. Its hard to calculate how much of my dev time has been related to the complexity and opacity of the engine and how much has been other things. I just dont see a necessary relationship between monstrosity and the time spent on development. I wonder what dave_the_m would say. I kinda wonder at how any industrial strength regular expression engine could be anything but a monster. A programs structure comes to reflect its problem domain I think, and when the domain is complex, the code will be too. And I think that the problem domain of search and replace with perl style not-so-regular regular expressions is quite complex. Ive looked at the sources for the latest TCL engine, and PCRE and they are all large comprehensive bodies of code. Ill grant that Perls is not the cleanest implementation, nor the best documented but IMO all of those packages are monsters. I guess it all comes down to what you consider monster code to be. --- $world=~s/war/peace/g	[reply]
Re^2: Things you should need to know before using Perl regexes. (Humour, with a serious point) by BrowserUk (Patriarch) on Oct 25, 2006 at 22:29 UTC
Your title to the contrary, I find it hard to see much humor in this post. I read it as more of your relentless bitterness at people who talk down Perl's threads. I'll trade you `s/Humour/Parody/`, if you'll trade me `s/bitterness/frustration/`? ... the superficial "point" of your post, the regex engine is indeed a monster. The (more than superficial) point of the post is that despite all the regex engines flaws, they haven't stopped it from being one of the great strengths of Perl 5. Nor have they prevented a huge number of very useful scripts being written that utilise it and run day in, day out in production environments all over the world. And through it's continual use, and the bug reports and feedback that they generate to the guys that maintain it, it has gone from strength to strength to strength. And continues to do so. No one will be pointing new questions about the regex engine to the OP, despite the truths it holds, because they know it has flaws, but they also know that the benefits of using it, with a modicum of care, far out-weight the risks of the flaws. Equally, no one, least of all me, is denying the flaws in iThreads. Do a supersearch against my handle for `threads clone copy fork` (you may need to specify an alternate delimiter to get OR functionality), and see how many posts I have devoted to noting and reiterating the problems with the iThreads architecture. Despite that, I have continually promoted the idea that for certain kinds of problems, with a modicum of care, using iThreads results in simpler, cleaner, more maintainable and reliable solutions than the alternatives. If no one used the regex engine, there would have been no incentive for it's improvement. If no one uses threads, there will be no incentive for them to improve. But that is still not the biggest point I am trying to make. Many of the limitations of iThreads are so fundamental, that it is doubtful whether they can ever be fixed. These are not bugs, but design and implementation limitations that would require huge changes to the core of perl 5 to eliminate. They come about through a combination of three main factors: As I pointed out elsewhere, retro-fitting threading to an existing, complex, mature product that was never intended to be used in a threaded environment is not just extremely technically challenging. It is damn nearly impossible without a ground up re-write. This is why I have described the work, and the people who achieved it, to give us iThreads as "heroic". I do not, ever, use that term lightly or sarcastically. The api chosen upon which to base perl threading, is the severely limited, strict POSIX pthreads description. This api is minimal, weak and flawed. If you doubt my opinion on this, look around at all the nix platforms that have extended it, often in mutually incompatible ways. The emulation of the fork mechanism. Without COW, this is hugely expensive of memory. Even with COW, it is hugely costly in time. In a program that was never written to be used in a threaded environment, with run times that originate from long before threading was ever a consideration--ie. before reentrancy was ever considered a virtue and so are littered with non-reentrant apis (like `strtok`) and hard coded limits (like `FILE structs`) and that have adapted to reentrancy through the path of least resistance--there is simply too much global and static data littered in isolated pockets at all levels for this to be efficient. Unless these limitations are explored and exposed--which requires that people use them--then there would be nothing to stop the next generation of perl P6, from making exactly the same decisions and exactly the same mistakes. And that , I strongly believe, is a point worth making. Even at the expense of stepping on a few peoples toes. liz, the author of the post I parodied, was the second person to respond to the OP and she did so in a far better way than I could ever have hoped for. She saw both the point and the funny side. Despite my implicit and explicit criticisms levelled at her with regard to the effect of her post upon the fortunes and reputation of iThreads, she has gone on to more than make up for it by contributing her time to the development of Perl 6 threading. In summary. Yes. Threading has a place in any modern language, because it can solve some problems more simply and more efficiently than any other solution. Yes. The in the next few Moore's cycles there will be more and more opportunities for threading to be used beneficially by application programmers, to reduce complexity as well as improve performance. No. It is not a total replacement for fork, nor Events, nor state machines, nor clusters. It complements them all. It adds another tool to the programmers toolkit that solves some problems that the other forms of parallelisation either cannot solve; or more easily than they can solve it. It also provides for an imperfect solution to traditionally forked problems on those platforms that do not have fork. Like my own preferred platform. I wish win32 had a proper fork. There is absolutely no technical reason why it could not. That cygwin can do it, albeit rather slowly and laboriously, is one good indication of this. That my own attempts have come close to achieving it is for me another. But politics is politics, and I have no expectation that MS are about to have a change of heart. Yes. iThreads are flawed. No. iThreads are not unusable. Even in production environments, given the appropriate knowledge and care. Yes. Locking can be a pain. Locking is is no more painful or difficult than dealing with file locks or record locks or IPC semaphores. And, used from Perl5, with its explicit protection of non-shared data from accidental concurrency; its isolation of its own internal structures from the application programmer and the removal of the need for them to concern themselves with the locking of those structures; and its provision of lexically scoped locking primitives; it's a lot easier in Perl than in many other languages. Yes. iThreads can improve, but people need to discover the limitations and bugs before those improvements can be made. Yes. All experience gathered from the development and use of iThreads can be usefully harnessed to ensure than a better design and implementation is used in the underpinnings of Perl 6. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]

Perl's regex engine is not lightweight

This is not a serious attack on the perl regex engine!