Re: Re: Capturing brackets within a repeat group [plus dynamic backreferences]

Replies are listed 'Best First'.
Re: Re: Re: Capturing brackets within a repeat group [plus dynamic backreferences] by ihb (Deacon) on Jan 12, 2003 at 00:50 UTC
Besides looking for documentation I thought you'd perhaps wanted an explanation why Perl's current behaviour is sane and to be expected, and given a way to do achieve what you thought Perl would do for you. Let me set my general reply in context of MAC address parsing: First things first though: `local $_ = join ':', qw/0 0A 0C B B8 F/; # $mac my $part = qr/[0-9A-Z]{1,2}/;` [download] First you used `my @parts = /^($part):($part):($part):($part):($part):($part)$/;` which worked. Then you tried to shrink it to `my @parts = / ^ (?: ($part) : ){5} ($part) $ /x;` [download] but that didn't work. Now, using "my" technique you just need to add three to four lines to achieve what you want. use re 'eval'; # Needed due to interpolation of $part my @parts; / (?{ local @_parts }) ^ (?: ($part) : (?{ local @_parts = (@_parts, $1) }) ){5} ($part) $ (?{ @parts = (@_parts, $2) }) /x; The beauty of this technique is that you don't have to know how many times you need/want to match; something that is required if you use the `x` operator. If you just want to solve this particular problem, why not simply verify with your second more compact regex and then `split` it up on `/:/`? Update: Since I got negative response on this reply I reworded the beginning to make it better express what I meant. If it sounded offensive or bad in any way then that wasn't how it was meant and I apologize. `ihb`	[reply] [d/l] [select]
Re: Re: Re: Re: Capturing brackets within a repeat group [plus dynamic backreferences] by BrowserUk (Patriarch) on Jan 12, 2003 at 03:27 UTC
First. It wasn't me that gave you a negative response, and whilst I didn't see what you said originally, I doubt I would have been offended. My rather terse reply (with smiley) to your original post was simply that I read that post several times and missed the relevance. This post clarifies your intent nicely, thankyou. Now to the contents of this post:). beauty of this technique is that you don't have to know how many times you need/want to match; This is where I felt you missed the point of my original post in as much as, not only do I know exactly how many parts I'm trying to capture, I only want to capture if there are exactly that number of parts to be captured. Hence the choice of using an exact repeat count {5}. ... why not simply verify with your second more compact regex and then split it up on /:/? Ah! Now that does offend me:^) Or rather, it offends my sense of efficiency. Using a regex to verify, and then split to extract the parts means parsing the string twice which seems wasteful when it can be done in a single pass. As has been liberally discussed elsewhere, this is almost certainly a micro-optimisation which in the big scheme of things in any given answer here, is hardly worth the effort, but... I (contrary to popular opinion), use my time and efforts here at the monastery as a learning experience. That is to say, whilst I sincerely hope that any answers I provide assist the OP to whom I provide them, much more significant from my personal perspective is that every single question I have a crack at means that I learn, re-learn or re-enforce some aspect of my knowledge of Perl. And one of the things that I try to learn whilst attempting to answer each question is "Is there a better way of doing it.". Now the definition of "better" can vary. Often this can mean 'clearer' or 'simpler'. Aristotle has an uncanny knack of taking a peice of my oft tortuous code and simplifying it using idiomatic perl and rendering a much simpler, clearer solution. sauoq invariably sees through any quick & dirty regexes and provides graphic demostrations of my bad assumptions. Too many others to mention have contributed to my learning with similar demonstrations of skill and ingenuity. One of my personal favorite definition of "better", is 'more efficient'. Whilst the increased efficiencies shown in short snippets generated as answers to specific SoPW's are often of little consequence, by learning what techniques are more efficient at this level, I hope that as my own projects get more complex, I will be better equiped to write efficent code at levels where it becomes significant. As an example (with no disrispect to the author intended), I recently attempted to optimise a peice of code that made liberal use of Math::Round. This module has one piece of code that is particularly clever--the mechanism of determining the smallest value greater than 0.5 that perls floating point representation can support on any given platform. This is apparently--and I am not sufficiently offay with the vaguaries of floating point math and FP processors to argue--quite important to the process of accurate rounding. If you take a quick look at the code in this module, you may notice several places where it could be easily made slightly more efficient, but nothing in particular stands out as demonstrably inefficient. However, in the context of a graphical application that makes heavy use of the functions in that module whilst processing 2- and 3-dimensional arrays of floating point values representing 2d and 3d coordinate vectors, at the inner levels of loops nested 2 and 3 deep, each of those small inefficiencies mount up. In this particular case, dramatically so. So, whilst in the context of the module, intuatively writing efficient code may seem unnecessary, in the wider context of the applications that use the inefficient code that results from not knowing better, can have a dramatic effect on the overall performance and useability of the final applications that use it. This effect is multiplied if the inefficient code is itself a dependancy of other modules that themselves are written without consideration for efficiency, especially if one or more of those levels makes use of perl's OO (or tie) facilities which are themselves fairly costly. Of course there is the argument that if efficiency is a high criteria for your application, then you should probably use a different language, but I eshew this on the basis that the loss of the convenience and increased development time that results from moving to using C, C++ or Java, far outweighs the benefits. Especially as in many (though not all) cases, a little knowledge or experimentation to find the most efficient of the MWTDI, can mean that the performance acheived using perl is adaquate. To this end, if I think I see a more efficient method of acheiving any particular goal in perl, I tend to explore it. And if it proves to be more efficient, and doesn't require too much sacrifice in terms of clarity, brevity or simplicity, then I tend to prefer that method over any other when writing similar code on the basis that one day that code may be called by other code from within a loop or recursive process such that the efficiency will become important. Out of interest, I realised that I have been here before and have made use of your push-to-array-from-regex-nested-code-block technique (to coin a name:). See Efficient run determination. for the nitty-gritty. Examine what is said, not who speaks. The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.	[reply]
Re: Re: Re: Re: Re: Capturing brackets within a repeat group [plus dynamic backreferences] by ihb (Deacon) on Jan 12, 2003 at 13:21 UTC
This is where I felt you missed the point of my original post in as much as, not only do I know exactly how many parts I'm trying to capture, I only want to capture if there are exactly that number of parts to be captured. Hence the choice of using an exact repeat count {5}. Ah, I did get that actually. But I was under the intention that you actually wanted to use a regex hack to get it working. Furthermore, I had a different objective than you, see below. Using a regex to verify, and then split to extract the parts means parsing the string twice which seems wasteful when it can be done in a single pass. Define "single pass". If you use capturing groups you stop the scanning to save a variable (for each backtrack, if that occures). That is not efficient. Your compressed non-capturing pattern `/^(?:$part:){5}$part$/` is approximately twice as fast as your long spelled-out capturing pattern. That gives you an idea of much time you can save and put on other things if you minimize the work that the regex does. Agreed, regexes are often pretty damned optimized, but when it comes to capturing groups they're still "slow". So I benchmarked some more and found that `/^(?:$part:){5}$part$/; my @parts = split /:/;` [download] runs slightly faster than `my @parts = /^($part):($part):($part):($part):($part):($part)$/` In your other posts I found that you wanted to do `my $re_mac_bUK = '(?: ( [0-9A-Z]{1,2} ) : )' x 5 . '( [0-9A-Z]{1,2} +)'; sub replace_bUK { s[^ $re_mac_bUK $] [ sprintf '%02s' x 6, $1, $2, $3, $4, $5, $6 ]ex; }` [download] This can be written more efficiently as `my $re_mac_ihb = qr/(?:$part:){5}$part/ . ''; sub replace_ihb { /^ $re_mac_ihb $/x; $_ = sprintf '%02s' x 6, split /:/; }` [download] As you see, scanning the pattern twice can actually be more efficient. One thing that I think deserves to be pointed out is that a `qr//` object doesn't always optimize. Sometimes it does the opposite. And that is when you interpolate the object. `qr//` objects aren't as magical as many seem to think. The reason you can use them in other patterns are that they stringify when interpolated. Thus, if you don't plan to use the pattern by yourself you can leave it as a string. Since I like to be able to not have to think about special regex escape codes etc I often use `qr//` but directly stringify it through concatenation, as shown above. Of course, that technique should be used with care since it forces another pattern compilation. (Constant patterns are only compiled once through: at the surrounding's compile time.) While on the topic of regex optimization and capturing groups: Always put the starting or closing bracket after zero-width assertions. That mean that you should put the parenthesis like `/^(...$)/m` It might seems silly to capture something that's zero-width, but this is solely so that perl won't have to do extra variable save/restore work if the assertion fails. In the next example the former is about 10% slower than the latter: `$_ = 'a ' x 1000; s/(\s+)$//; # Slower s/(\s+$)//; # Faster` [download] One of my personal favorite definition of "better", is 'more efficient'. Interesting. My favourite definitions of "better" varies between "more general", "simpler", and "more efficient". And that's often how I choose to code. In brief, this is how I code algorithms: I first write a draft, then I write a very general approach (and shove it into a module `:)`). Sometimes this can be unnecessary, but it at a minimum teaches me something. After that I start to simplify my general approach. Often that means that I also make it more efficient. Sometimes though you can optimize because you know certain things about the data, and I'm all for that if you know your data well. I (contrary to popular opinion), use my time and efforts here at the monastery as a learning experience. That is to say, whilst I sincerely hope that any answers I provide assist the OP to whom I provide them, much more significant from my personal perspective is that every single question I have a crack at means that I learn, re-learn or re-enforce some aspect of my knowledge of Perl. And one of the things that I try to learn whilst attempting to answer each question is "Is there a better way of doing it.". Amen to that! As you can read in my presentation I code purely out of interest. That means I have no deadlines to keep or whiny managers that like me to code a certain way. That gives me time to explore! I'm not the kind of person that says "OK, it works. NEXT!". I can fiddle around with code all day just to see what happens. This not to say that I'm especially good at being funky with code; my point is that I definately agree with your mind-set. `:)` Of course there is the argument that if efficiency is a high criteria for your application, then you should probably use a different language, but I eshew this ... I totally agree. Just because you can code slow perl doesn't mean you have to. If there is a more efficient way of doing something (while not loosing generality `;)`), I see no reason not to (unless it's quite obfuscated) write the more efficient snippet. People tend to think in an "either/or" manner when it comes to Perl optimization. Either you optimize and use another language, or you simply code convenient Perl. But how about convinient optimized Perl? Relatively speaking, almost everything done in Perl is convenient. `;)` Cheers, `ihb`	[reply] [d/l] [select]