in reply to Modifying pos() from within (?{...})

I am also attempting to move the position with the regex by assigning to pos(). While the assignment seems to work, as in there are no errors, it seems to have no affect.

Yes, this is documented in pos:

pos directly accesses the location used by the regexp engine to store the offset, so assigning to pos will change that offset, and so will also influence the \G zero-width assertion in regular expressions. Both of these effects take place for the next match, so you can't affect the position with pos during the current match, such as in (?{pos() = 5}) or s//pos() = 5/e.

As for the general approach, I think you should probably anchor the match to the end of the previous one with \G, use the /gc modifiers, and check for whether the end of string was reached:

my $string = " foo bar quz "; pos($string)=undef; # just to play it safe while ( $string =~ / \G \s* (foo|bar|quz) \s* /xgc) { print "<$1>\n"; } die "match failed at pos ".pos($string) unless pos($string)==length($string);

Although of course a regex is probably not the right tool here, unpack is probably better.

Replies are listed 'Best First'.
Re^2: Modifying pos() from within (?{...})
by mxb (Pilgrim) on Apr 26, 2018 at 13:18 UTC

    Thanks for this, it's clear and explains why the (?{...}) solution didn't (and could never) work.

    I've not used \G and /c before. From reading perlre I now understand that \G essentially 'anchors' that point to index pos() into the string. I'm a little confused by the description for the /c modifier. From perlre v5.24:

    c - keep the current position during repeated matching

    and from the referenced perlretut

    A failed match or changing the target string resets the position. If you don't want the position reset after failure to match, add the "//c", as in "/regexp/gc".

    I assume then, should the data I'm parsing with the regex be well formatted, then pos() will never be reset and so the /c modifier is not needed? But were any matches fail, things would break

    Is this a correct understanding of \G and /c?

      I assume then, should the data I'm parsing with the regex be well formatted, then pos() will never be reset and so the /c modifier is not needed?

      Note that the condition on the while loop is the regex, so the loop will run while the match is true, so the last match executed will always be a failed one. The question then is why it failed, because it reached the end of string (= successful overall parse) or it did not, which is why I compare pos to length - but for pos to be available there, I need /c.

      Another use for /c is described in "\G assertion" in perlop (under "Regexp Quote-Like Operators"): basically, attempting to apply multiple different regexes at the same point in a string until you find one that matches.