in reply to Re^3: Did regex match fail because of "end of string"?
in thread Did regex match fail because of "end of string"?

Is there a reason that you cannot simply keep starting from the same location until you receive an end-of-string, or find a match?

Yes, I don't know if the regex reached the end of the string and failed, in which case I'd have to load more data.

Your method seems to be a bit blunt, removing a char blindly from the regex - which leads to many non-valid regexes and big performance penalties. The idea is quite interesting, though ;-)

  • Comment on Re^4: Did regex match fail because of "end of string"?

Replies are listed 'Best First'.
Re^5: Did regex match fail because of "end of string"?
by Illuminatus (Curate) on Oct 17, 2007 at 08:08 UTC
    Yes, I don't know if the regex reached the end of the string and failed, in which case I'd have to load more data.

    If the match test fails then you have reached the end of the string without a match, unless the regex begins with a '^'. If you disallow this, you should be fine:

    $str=""; while(<>){ $str.=$_; last if (m/a\d+b/g); }
    The end of the string could be a partial match at the end, but you don't care, because the next string catenation will either allow a match, or discard it (depending on what the new data turns out to be).

    The previous ugly example is most likely the only other solution. It may not be as cpu-expensive as you think. Since each iteration is anchored at the end-of-string, it will not match against the whole string in general. The invalid regex's will bail immediately, without matching a thing.

      There is another problem: false positives.

      When you have a lookbehind or a {2,*} quantifier and remove them from the end you get matches that shouldn't work at all.

      I know that this can be handled by repeating the match, but this whole business turns out to be so nasty that I'm discarding my original idea and simply slurp the whole file into memory... ;)