Now suppose the pattern we are looking for overlaps with itself -- e.g. is aa. The following:while ($s =~ /abc/g) { print "found abc at $-[0]\n"; }
does not quite work: if the input string is "baaaad", for example, it would output:while ($s =~ /aa/g) { print "found aa at $-[0]\n"; }
It leaves out "found aa at 2", which is entirely expected given how /g is defined (the next match starts after the end of the previous one).found aa at 1 found aa at 3
Now, it is straightforward enough to tweak the loop so it finds all the matches, using pos as an lvalue:
This "tricks" the RE engine to start searching again at the very next character, so all matches will be found, even if they overlap.while ($s =~ /aa/g) { print "found aa at $-[0]\n"; pos($s) = $-[0] + 1; }
This works fine, and solves the particular practical problem I am working on, but it got me to thinking: is it possible to get this bevahior purely declaratively -- i.e. in the RE itself, not by tweaking pos after the match?
Thanks!
Update:Just as I was about to post this, I figured out my own answer, using a lookahead: /(?=aa)./. The lookahead matches without advancing pos, the "." advances pos by 1. Are there other ways to do this? More efficient ways?
--JAS
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |