I just picked up the new second edition of Mastering Regular Expressions and was marveling at how much regex support exists in other languages, features I didn't know existed, and - boggle of boggles - something java did better(?) than perl.
Particularly, I looked at what he dubbed 'atomic grouping' with great interest. Perl appears to support it as the (?>) operator. Java supports it in a shorter manner by just adding a + onto another quantifier (++ *+ ?+ {m,n}+) and calls it the 'possessive quantifier'. For those of you who, like me, had never run into this critter before, the simple explanation is this:
Possessive quantifiers take the greedy quantifiers and tell them to never let go of anything they've grabbed. Potentially, they can make matches faster by eliminating unnecessary backtracking.
I lit on atomic grouping as a useful feature, found it documented logically yet rather quietly as 'match nonbacktracking subpattern' in the Camel 3. The possessive extra '+' seemed like a convenient shortening for many purposes. Many questions have sprouted in my mind:
- Are Those Who Make The Changes plan on adding the possessive quantifiers (++ *+ ?+ {m,n}+) to the regex engine in 5.10?
- To give this node an additional purpose, does anyone have some particularly good examples of when they have used possessive quantifiers or (?>) in Real Life (tm)?
- Lastly, does anyone know how to cure the desire to treat the above "(tm)?" as zero-or-one occurences of the string 'tm', captured? I think I've got REs on the brain. :-)
Edit: Fixed broken link. larsen
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.