in reply to Re^6: Progressive matching w/substitutions (pos)
in thread Progressive matching w/substitutions

Common sense. Confirmed by observation.

Actually, I have a module that supports just this type of thing and more that I hope to finish and release to CPAN this year, anyway.

pos doesn't apply to s/// just like it doesn't apply to pack/unpack or anything else. pos only talks about m//g and that is the only thing that it applies to. It doesn't talk about the things that it doesn't apply to and it doesn't apply to s/// nor split. Similar for \G.

The way you run code between steps of s/// is to use s//.../ge. You can't do that with m//g and, similarly, you can't do while( s///g ) { .... }.

- tye        

Replies are listed 'Best First'.
Re^8: Progressive matching w/substitutions (pos)
by argv (Pilgrim) on Aug 11, 2008 at 02:46 UTC
    pos doesn't apply to s/// just like it doesn't apply to pack/unpack or anything else. pos only talks about m//g and that is the only thing that it applies to.
    No question about it. However, the nature of how m// and s/// appears, and how they are documented closely together, makes them appear much more closely related, which would also allow them to share this particular characteristic -- especially in the context of "progressive matching." Remember, the frame of mind that someone is in when they think "progressive" is to "match-operate-match-operate, etc." Since m// lets you do that in a while loop, it seems reasonable to (at least) suspect that maybe s/// would do it too. By contrast, one wouldn't expect to apply to pack/unpack "or anything else" (as you put it) because those functions don't look so similar. Those are conceptually different.

    if nothing else, I think it warrants comment in documentation that pos can be reset if used with s///, "or any other operation that may change the string, even if that change occurs later in the string than pos points to."

    FWIW, I look forward to learning about your perl module that you described. It does seem to acknowledge the need for what I suspected to be true (or "should have been true." :-)

      Indeed, it took experience to realize that \G doesn't work within split nor with s///. And, a quick test shows that \G and even pos (now) works within s///g:

      > echo xxxfooxxx | perl -pe's/\Gx/pos($_)/ge' 012fooxxx

      I can understand there being a reluctance to update documentation to proclaim "pos doesn't work with s///", for example (the potential authors of such a documentation patch share your reservations about whether this is an accident of implementation, an intentional design that might still change, or something that is that way "for a very good reason"). But I agree that some improved clarity is called for with somewhat vague additions similar to "Currently \G is not supported with split and using it there can produce surprising results."

      With experience, I also learned to appreciate the drawbacks of implied, global state as provided by pos and each. Just last week I "lost" several hours at work to a pair of bugs, one that left each unreset and one that used each without reseting it first.

      Part of the point of my module is that it requires you to declare where you want to start using this state and doesn't store the state globally. Thus it removes some of the common gotchas of pos and scalar m//g.

      So I certainly would be reluctant to change s/// to be sensitive to pos. It would be less troubling if s/// only obeyed pos when \G is used in the regex, but that isn't ideal. Maybe s///p?

      - tye