in reply to Re: Re: Possessive Quantifiers
in thread Possessive Quantifiers

If there's some particular insight as to why it would be worse in this particular case, I'd be delighted to be enlightened. As it is, in my ignorance, I like the + possessive because I think it looks more readable than the (?>) alternative, and is likely a better mnemonic, since I have to keep looking at the man page to remember which odd character goes into (?>) .

I don't know enough about the Java half of this question to be strongly opinionated but I can offer some thoughts.

The construct is mostly good for optimizing the failure case of a specific subset of patterns. Consequently, it is infrequently used in Perl and presumably, that's the case in Java as well. The symbol "+", however, is frequently used for a very common case. Overloading its meaning could be confusing.1 It might be easily missed or look like a typo, particularly to someone unfamiliar with it.

The (?>) construct allows grouping that Java's doesn't seem to handle. I'm guessing2 that (?>a*b*) is equivalent to Java's a*+b*+. If so, and (?>a+b?c{3,7}d*e+) looks like a++b?+c{3,7}+d*+e++ in Java, then Java's representation starts to get long and messy. What about (?>a*(b|c)d*)? Can that be expressed in Java at all? Or is Java restricted to modifying quantifiers?

1. It might be a real + for obfuscation though. :-)
2. I'm not sure of this. Someone please correct me if I'm wrong.
-sauoq
"My two cents aren't worth a dime.";

Replies are listed 'Best First'.
Re(4): Possessive Quantifiers
by Arien (Pilgrim) on Aug 18, 2002 at 10:45 UTC
    I'm guessing that (?>a*b*) is equivalent to Java's a*+b*+.

    The equivalent to Java's a*+b*+ would be (?>a*)(?>b*). In this case that wouldn't make a difference of course, but it would in situations like this one:

    $_ = "aaab"; print "/(?>[ab]+)(?>b+)/ matches $_\n" if /(?>[ab]+)(?>b+)/; print "/(?>[ab]+b+)/ matches $_\n" if /(?>[ab]+b+)/;
    What about (?>a*(b|c)d*)? Can that be expressed in Java at all?

    How about (a*(b|c)d*){1}+? (Yeah, it's ugly...)

    — Arien

      > How about (a*(b|c)d*){1}+? (Yeah, it's ugly...)

      Yup, it is. But that's why we keep the uglier but more appropriate (?>) around in Perl. Occasionally, it's the right/only operator for the job.

      As best as I can figure, java thinks it's being clever by using the possessive instead of (?>), which is dumb, since it doesn't cover the simple case of exactly one match. We can do better than that :-)