in reply to Re^2: Did regex match fail because of "end of string"?
in thread Did regex match fail because of "end of string"?

The construct you are showing is not 'anchored'. The only anchor expressions are '^' (beginning of string) and '$' (end of string). If I am understanding correctly, all you really care about are partial matches at the end of the current available string. Partial matches in the middle are already discarded as non-matches.

Is there a reason that you cannot simply keep starting from the same location until you receive an end-of-string, or find a match? Can this be more data than you want to hold?

If you can't do this, I can think of one (very ugly) option. Something like this:

sub example { $foo = "[&#\$]"; $regex = "a\\d+[ars]{2,4}(aa|ab|ac)"; $string="wle;fnaekf;fla;lkcnovnifa "; $min = $regex."\$"."foo"; if ($min !~ /\$$/) { $min .= '$'; } $match = 0; $tot = length($string); $index = $tot; print "index is $index\n"; while (1) { print "min is $min\n"; eval { if ($string =~ m/$min/g) { $index = pos $string; $match = 1; } }; # print "err is $@\n"; last if $match; $min =~ s/..$//; last if $min eq ""; if ($min !~ /\$$/) { $min .= '$'; } } return $index; } $ind = example();
You will also have to special-case lines terminated with '\'.

Replies are listed 'Best First'.
Re^4: Did regex match fail because of "end of string"?
by moritz (Cardinal) on Oct 17, 2007 at 05:45 UTC
    Is there a reason that you cannot simply keep starting from the same location until you receive an end-of-string, or find a match?

    Yes, I don't know if the regex reached the end of the string and failed, in which case I'd have to load more data.

    Your method seems to be a bit blunt, removing a char blindly from the regex - which leads to many non-valid regexes and big performance penalties. The idea is quite interesting, though ;-)

      Yes, I don't know if the regex reached the end of the string and failed, in which case I'd have to load more data.

      If the match test fails then you have reached the end of the string without a match, unless the regex begins with a '^'. If you disallow this, you should be fine:

      $str=""; while(<>){ $str.=$_; last if (m/a\d+b/g); }
      The end of the string could be a partial match at the end, but you don't care, because the next string catenation will either allow a match, or discard it (depending on what the new data turns out to be).

      The previous ugly example is most likely the only other solution. It may not be as cpu-expensive as you think. Since each iteration is anchored at the end-of-string, it will not match against the whole string in general. The invalid regex's will bail immediately, without matching a thing.

        There is another problem: false positives.

        When you have a lookbehind or a {2,*} quantifier and remove them from the end you get matches that shouldn't work at all.

        I know that this can be handled by repeating the match, but this whole business turns out to be so nasty that I'm discarding my original idea and simply slurp the whole file into memory... ;)