With the look-ahead and look-behind constructs documented in perlre.html#Extended-Patterns, you can "roll your own" zero-width assertions to fit your needs. You can look forward or backward in the string being processed, and you can require that a pattern match succeed (positive assertion) or fail (negative assertion) there.
The regular expression engine will do its best to match .*foo, starting at the end of the string "foo". If it is able to match that, then the negative look-ahead will fail, which will force the engine to progress through the string to try the next foo./foo(?!.*foo)/
or to put the hyphen in look-ahead:s/(?<=foo)/,/g; # Without lookbehind: s/foo/foo,/g or s/(foo)/$1,/g
This kind of thing is likely to be the bulk of what you use look-arounds for. It is important to remember that look-behind expressions cannot be of variable length. That means you cannot use quantifiers (?, *, +, or {1,5}) or alternation of different-length items inside them.s/(?<=look)(?=ahead)/-/g;
/foo # Match starting at foo ( # Capture (?: # Complex expression: (?!baz) # make sure we're not at the beginning of baz . # accept any character )* # any number of times ) # End capture bar # and ending at bar /x;
The concept is pretty simple, but the notation becomes hairy very quickly, so commented regular expressions are recommended. Let's look at the real example of Regex to add space after punctuation sign. The poster wants to put a space after any comma (punctuation, actually, but for simplicity, let's say comma) that is not nestled between two digits. Building up the s/// expression:
Note that multiple lookarounds can be used to enforce multiple conditions at the same place, like an AND condition that complements the alternation (vertical bar)'s OR. In fact, you can use Boolean algebra ( NOT (a AND b) === (NOT a OR NOT b) ) to convert the expression to use OR:s/(?<=, # after a comma, (?! # but not matching (?<=\d,) # digit-comma before, AND (?=\d) # digit afterward ) )/ /gx; # substitute a space
s/(?<=, # after a comma, but either (?: (?<!\d,) # not matching digit-comma before | # OR (?!\d) # not matching digit afterward ) )/ /gx; # substitute a space
This is most useful for finding overlapping matches in a global pattern match. You can capture substrings without consuming them, so they are available for further matching later. Probably the simplest example is to get all right-substrings of a string:
Note that the pattern technically consumes no characters at all, but Perl knows to advance a character on an empty match, to prevent infinite looping.print "$1\n" while /(?=(.*))/g;
|
---|