in reply to Re: Re: Re: Re: Capturing brackets within a repeat group [plus dynamic backreferences]
in thread Capturing brackets within a repeat group

This is where I felt you missed the point of my original post in as much as, not only do I know exactly how many parts I'm trying to capture, I only want to capture if there are exactly that number of parts to be captured. Hence the choice of using an exact repeat count {5}.

Ah, I did get that actually. But I was under the intention that you actually wanted to use a regex hack to get it working. Furthermore, I had a different objective than you, see below.

Using a regex to verify, and then split to extract the parts means parsing the string twice which seems wasteful when it can be done in a single pass.

Define "single pass". If you use capturing groups you stop the scanning to save a variable (for each backtrack, if that occures). That is not efficient. Your compressed non-capturing pattern /^(?:$part:){5}$part$/ is approximately twice as fast as your long spelled-out capturing pattern. That gives you an idea of much time you can save and put on other things if you minimize the work that the regex does. Agreed, regexes are often pretty damned optimized, but when it comes to capturing groups they're still "slow". So I benchmarked some more and found that
/^(?:$part:){5}$part$/; my @parts = split /:/;
runs slightly faster than   my @parts = /^($part):($part):($part):($part):($part):($part)$/ In your other posts I found that you wanted to do
my $re_mac_bUK = '(?: ( [0-9A-Z]{1,2} ) : )' x 5 . '( [0-9A-Z]{1,2} +)'; sub replace_bUK { s[^ $re_mac_bUK $] [ sprintf '%02s' x 6, $1, $2, $3, $4, $5, $6 ]ex; }
This can be written more efficiently as
my $re_mac_ihb = qr/(?:$part:){5}$part/ . ''; sub replace_ihb { /^ $re_mac_ihb $/x; $_ = sprintf '%02s' x 6, split /:/; }
As you see, scanning the pattern twice can actually be more efficient.

One thing that I think deserves to be pointed out is that a qr// object doesn't always optimize. Sometimes it does the opposite. And that is when you interpolate the object. qr// objects aren't as magical as many seem to think. The reason you can use them in other patterns are that they stringify when interpolated. Thus, if you don't plan to use the pattern by yourself you can leave it as a string. Since I like to be able to not have to think about special regex escape codes etc I often use qr// but directly stringify it through concatenation, as shown above. Of course, that technique should be used with care since it forces another pattern compilation. (Constant patterns are only compiled once through: at the surrounding's compile time.)

While on the topic of regex optimization and capturing groups: Always put the starting or closing bracket after zero-width assertions. That mean that you should put the parenthesis like   /^(...$)/m It might seems silly to capture something that's zero-width, but this is solely so that perl won't have to do extra variable save/restore work if the assertion fails. In the next example the former is about 10% slower than the latter:
$_ = 'a ' x 1000; s/(\s+)$//; # Slower s/(\s+$)//; # Faster
One of my personal favorite definition of "better", is 'more efficient'.

Interesting. My favourite definitions of "better" varies between "more general", "simpler", and "more efficient". And that's often how I choose to code. In brief, this is how I code algorithms: I first write a draft, then I write a very general approach (and shove it into a module :)). Sometimes this can be unnecessary, but it at a minimum teaches me something. After that I start to simplify my general approach. Often that means that I also make it more efficient. Sometimes though you can optimize because you know certain things about the data, and I'm all for that if you know your data well.

I (contrary to popular opinion), use my time and efforts here at the monastery as a learning experience. That is to say, whilst I sincerely hope that any answers I provide assist the OP to whom I provide them, much more significant from my personal perspective is that every single question I have a crack at means that I learn, re-learn or re-enforce some aspect of my knowledge of Perl. And one of the things that I try to learn whilst attempting to answer each question is "Is there a better way of doing it.".

Amen to that! As you can read in my presentation I code purely out of interest. That means I have no deadlines to keep or whiny managers that like me to code a certain way. That gives me time to explore! I'm not the kind of person that says "OK, it works. NEXT!". I can fiddle around with code all day just to see what happens. This not to say that I'm especially good at being funky with code; my point is that I definately agree with your mind-set. :)

Of course there is the argument that if efficiency is a high criteria for your application, then you should probably use a different language, but I eshew this ...

I totally agree. Just because you can code slow perl doesn't mean you have to. If there is a more efficient way of doing something (while not loosing generality ;)), I see no reason not to (unless it's quite obfuscated) write the more efficient snippet. People tend to think in an "either/or" manner when it comes to Perl optimization. Either you optimize and use another language, or you simply code convenient Perl. But how about convinient optimized Perl? Relatively speaking, almost everything done in Perl is convenient. ;)

Cheers,
ihb
  • Comment on Re: Re: Re: Re: Re: Capturing brackets within a repeat group [plus dynamic backreferences]
  • Select or Download Code