in reply to Applying regexes to streams: Perl enhancement idea

I suspect you could do that with a negitive look-ahead assertertion on $ at the end of your regex -- IE assert that the end of the match isn't the end of the string. I don't really understand regexes, though, and that wouldn't set pos(), because the regex would fail. You could combine that technique with a (?{code}) block that sets a variable to pos(), though, no?


Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

  • Comment on Re: Applying regexes to streams: Perl enhancement idea

Replies are listed 'Best First'.
Re^2: Applying regexes to streams: Perl enhancement idea (not that easy)
by tye (Sage) on Jan 07, 2003 at 23:29 UTC

    No. That would prevent the regex from succeeding at end-of-string. What I want is to prevent the regex from backtracking due to end-of-string. This can happen at any point in the regex so there is no one place in the pattern that you can put something to cause it to happen. It would be like putting a special token at the end of the string such that every part of the regex treats that token specially.

    It could/should actually do even more than that. Even "mel" =~ /l+/z should fail because it terminated the search due to the end-of-string and the next bytes on my stream might well be "low" and so I'd want that regex to match both "l"s.

                    - tye
      How would I go about telling /z to wrap it up and accept the end of string as end of match? There are really two things you are asking of the engine: to continue where it left off last time, and to fail without forgetting where it's at when it hits the end of string. You need a way to be able to ask for the first without the latter. Otherwise, as a silly example (but let's pretend it isn't), /.+/z would always fail, even at the end of my input stream where I'd want it to successfully match at end of string.

      Makeshifts last the longest.

        Good point. My original example code didn't handle that case correctly in part because it started out as an example of using a regular expression to match record terminators and in part because I had not fully considered the effect of //z on greedy matches until I replied to theorbtwo's node.

        We already have a separate "continue where it left off last time" feature for regular expressions: //g in a scalar context and pos(). So my example is easy to fix by dropping /z once I've found end-of-stream. I'll update it shortly to reflect this.

        Note that my example fetches pos() in order to strip stuff from the front of the buffer, therefore each match is performed with pos()=0. If, for example, you were instead matching record terminators, then you would instead fetch pos() in order to restore it before the next match (since the sysread updates the contents of the buffer which also resets its pos).

        Thanks,
                        - tye

Re^2: Applying regexes to streams: Perl enhancement idea
by diotalevi (Canon) on Jan 12, 2003 at 06:16 UTC

    The regex engine does not allow (?{code}) blocks to alter pos(). I wanted to do that once and dug into the source. Just prior to executing the code it saves a copy of pos() and restores it immediately afterward. tye answered the other concern but didn't address this issue.


    Fun Fun Fun in the Fluffy Chair