Re: Regex help

There are a number of replies to your question, proposing a variety of solutions to your problem, while sticking with variations on your original regular expression that attempt match everything, capturing different parts of the match with capture-parenthesis.

Eventually someone will hit on the right technique; one that isn't plagued by lazy regexp engines, greedy matching, etc. But there's another possiblity...

You could make it easier on yourself, not worrying about trying to match ^(.*?) nongreedily, or about the lazy engine, or about (.*?)$ slurping everything up. Do it like this:

my $pattern = "AB";
print "pattern is $pattern\n";
my ( $middle ) = $string =~ /($pattern+)/;
my ( $start, $end ) = ( $`, $' );
#.... and so on....
[download]

You take a performance hit in all regexp's in the program for using $` and $', but as I understand it, introducing capturing parens also introduces a similar performance hit for the current regular expression. And in non-time-critical operations (anything outside of tight loops) you don't really need to worry about the performance anyway right? ...so just do it the easy way.

If it turns out that you can't live with the speed-efficiency hit taken by leaning toward programming-efficiency, you can dig into other solutions. But the fact is that $`, $', and $& are there to be used, as long as you understand the ramifications of their use. To my knowledge, their use isn't deprecated, and it would seem that newer releases of Perl have even taken steps to make the use of those special variables more speed-efficiency friendly.

When the solution becomes so tricky that a dozen followup posts are still debating how to accomplish it, I think it's time to implement Perl's credo: There is more than one way to do it. (Start looking for a simpler solution). To that end, give my example a try.

Hope this helps...

Dave

"If I had my life to do over again, I'd be a plumber." -- Albert Einstein

Comment on Re: Regex help Download Code

Replies are listed 'Best First'.
Re: Re: Regex help by bart (Canon) on Aug 24, 2003 at 16:53 UTC
You are correct, but the performance hit associated with $` and $' affects every regex in your script, not just this one you want to use it for. This is an old problem, and the reason why use of $` and $' is frowned upon for larger scripts. Though I'm almost sure that the perl5porters will find ways to minimize this problem over time.	[reply]
Re: Re: Re: Regex help by davido (Cardinal) on Aug 24, 2003 at 17:09 UTC
It is true that the time-performance hit for using $` and $' persists through every regexp in the program, because if those special variables are used just once, Perl makes the decision that all regexp's in the program should now use those special variables. The performance hit will be among all regexp's in the program, including those that don't use either those special variables, or capturing parenthesis. In the Camel book, one item under "Time Efficiency" is not to use $`, $&, and $'. However, one item under "Programmer Efficiency" is to use $`, $%, and $'. To me that says, weigh the time vs. programming simplicity paradox, and choose whichever one you feel is the best for your situation. The OP's code section was brief. Solving it using non-greedy matches, non-capturing and capturing parens, and a slightly-tricky regexp proved to be the topic of a dozen or so post replies in the thread. That tells me that the solutions that followed in the spirit of the OP's methodology were all too complex for the simple problem trying to be solved. That led me to decide, why not take the simpler, less time efficient, but much more programming efficient approach. It would be wrong to say that the use of $`, $&, and $' are depricated. Their use is clearly not. It just comes with a caviet: Use them but understand that they will cause a time performance issue with regexp's in your program. It is probably safe to say that at some point that will become less of an issue, as Perl continues to grow and develop. And clearly Perl's designers intend to keep those special variables, not just for backward compatibility, but for their continued use. 5.8.0, for example, has found a way to minimize the impact of $&. I wouldn't be surprised to see the impact of $` and $' get improved upon in the future, though I can't claim to know what's going on in the minds of Perl's developers. Anyway, sorry to get longwinded. I just wanted to explain that it is ok to make a conscious decision to use one method over another, as long as you understand the ramifications of each method. Dave "If I had my life to do over again, I'd be a plumber." -- Albert Einstein	[reply]