in reply to Re: Question on Regex grouping
in thread Question on Regex grouping

First, do not put parens '()' around things that you have no interest in using later. "Capturing" these things consumes time and resources and to no effect.
Most of the cost is paid by the first parenthesis, that is, there's a significant cost difference between not using capturing parens at all, and using capturing parens. Additional parens don't contribute that much.
In general, I avoid using $1, $5 etc. Use Perl list slice instead.
Careful here. Using a list slice (which I find quite ugly), or assigning the list to puts the match in list context, which will change the behaviour if /g is present.

But more importantly, in certain cases, when using list slices, you do not know whether there was a match or not:

my $a = rand() < .5 ? "f" : "g" my $b = rand() < .5 ? "p" : "o"; my $c = ("foo" =~ /($a)*$b/)[0];
Did it match, or didn't it? If $c is defined, it matched. But what if $c isn't? If $a eq "g", and $b eq "o", there is a match, but $c is undefined.

Replies are listed 'Best First'.
Re^3: Question on Regex grouping
by Marshall (Canon) on Dec 21, 2010 at 13:17 UTC
    "Additional parens don't contribute that much".

    Fair enough, there is overhead in doing it at all. I am saying "don't over do it".

    list slice, hash slice, etc are some of the most cool features in Perl! You are completely correct in that list slice does not "play well with match global" because the number of things that can be returned is variable and therefore there is no way to specifiy a subset of range indicies that are of interest.

    The classic example of list slice is used when spliting a line and you want 127,[3..5],93,8 things on that line. And I do work with DB lines like that - it is actually common for such a thing. List slice allows me to assign those 6 things directly into variables that mean something within the program. I usually assign vars on the left ($x,$y,$z..) in the order that the following code will use them. And adjust the slice accordingly.

    If you are saying that "do not use list slice when doing a match global", I would absolutely agree with that. And I do not think that I have recommended that.

    In your code, my $c = ("foo" =~ /($a)*$b/)[0]; is an improper use of list slice.

    Properly used, list slice is beautiful.

      In your code, my $c = ("foo" =~ /($a)*$b/)[0]; is an improper use of list slice.
      Then enlighten us, what is the "proper use" of list slices to avoid using $1, etc? Note that the OP used the match in an if statement, so whether the pattern matched or not is important.
      If you are saying that "do not use list slice when doing a match global", I would absolutely agree with that. And I do not think that I have recommended that.
      Well, you wrote:
      In general, I avoid using $1, $5 etc. Use Perl list slice instead.
      If you are using words like in general, followed by an unqualified demand what to use, be prepared for others pointing out cases where the "in general" doesn't work.
        In general, I avoid using $1, $5 etc. Use Perl list slice instead.

        I don't know how to make that more clearly stated. My code shows a very clear example of not using $1. My code also has the case that $string8 is undefined, which would happen if the match failed. Often in parsing, it is desired to keep going, not in this case perhaps but that does happen. I showed the place to do that if this is necessary - there is a comment block about that.

        In Perl 5.10, $string8 //= ''; Sets $string8 to null string if $string8 is undefined. In older Perl, we only had $string |= ''; which is not quite the same thing. The new Perl operator tests for "definedness" instead of "truthfulness". But anyway there is a place in the code to use that info.

        In your code: "Did it match, or didn't it? If $c is defined, it matched. But what if $c isn't? If $a eq "g", and $b eq "o", there is a match, but $c is undefined."

        You are saying that if $c is undefined, then the match didn't work. Ok. True. What else is there to say about this? I said that this was a misuse of list slice, because you were presenting this as case where list slice didn't work. Perhaps my English prose wasn't as quite as well written as it could have been. Ok, if $c is undefined, then there is no information other than "it didn't work". List slice will not "save the day" in this case. That is why I said it was a misuse.

        Javafan is a very, very high level Monk and you know perhaps even more than I do, that often what is asked in a post is not what is really needed. I offered a credible solution to what I thought the OP needed and in addition showed ways to extend that solution. The OP thanked me. So, what problems remain? I think none.

        You think that list slice is "ugly". Ok. It might very well be! I would suggest that we leave this thread and that you start a new thread: re: "what are proper uses of list slice?". And I am sure that this will be one of the most talked about threads in recent history.

Re^3: Question on Regex grouping
by TenThouPerlStudents (Initiate) on Dec 22, 2010 at 03:12 UTC
    People who dislike list slicing should avoid scripting languages, especially Perl. It's FALSE that you don't know whether there was a match with m//g because the created list is simply empty. Also, the GOATSE ( =()=) recreates the right context if that's an issue. Take this code: $x = "a123b345c7865d87"; @L = ($x =~ /a-z/g)1,3; print "@L"; ## Prins b d @X = ($x =~ /#/g)1,3; print (defined(@X) ? "YES" : "NO"; It prints NO ... therefore, JavaFan, your assertions are FALSE and FALSE. TenThouPerlStudents

      People who dislike list slicing should avoid scripting languages, especially Perl. It's FALSE that you don't know whether there was a match with m//g because the created list is simply empty. Also, the GOATSE ( =()=) recreates the right context if that's an issue.

      Take this code: $x = "a123b345c7865d87"; @L = ($x =~ /a-z/g)1,3; print "@L"; ## Prins b d @X = ($x =~ /#/g)[1,3]; print (defined(@X) ? "YES" : "NO";
      It prints NO ... therefore, JavaFan, your assertions are FALSE and FALSE. TenThouPerlStudents
        People who dislike list slicing should avoid scripting languages, especially Perl.
        Really? Just because I find list slices to avoid using $1 ugly? What else? People who don't like goto should avoid Perl? People who don't like m?? should avoid Perl? People who don't like code without warnings or strict?

        @X = ($x =~ /#/g)[1,3];
        Ain't working. Sure, for the given pattern, it works. Now, let's change the pattern a little, shall we:
        my $x = "1234"; my @X = ($x =~ /(#)?/)[1,3]; say scalar @X; # defined(@X) is deprecated say "Matched" if $x =~ /(#)?/; __END__ 0 Matched
        So, @X is empty, yet the pattern matches.
        It prints NO ... therefore, JavaFan, your assertions are FALSE and FALSE.
        When I say "it doesn't always work", a single example where it does work isn't a contradiction.

        Note also that Marshall wasn't doing it your way anyway, he put the list slice in scalar context.

      Sorry about the format but ... why does one need to use HTML tags to format one's own post???? It's my first post. I guess I'm used to sites more intelligently designed that format as written in the window.

      Nonetheless, the points I made are compelling. JavaFan's assertions are absolutely false. List slicing of //g, if it creates no list, makes the list variable undefined

      I'd also like to add that Perl nitpickers like to get all hot and bothered about lists vs. arrays yet the goatse is the one stop shop that enables //g to be added if one is COUNTING matches.

      I've largely avoided this site because as a lurker I've noticed that the "priests" and above are more interested in obscurantism and showing off than in helping newbies correctly and getting jobs done simply. TMTOWTDI is VASTLY abused here.