Re: Did regex match fail because of "end of string"?

There's no easy way to do this. You could modify the regex engine, or you could modify your regex to check for the appropriate conditions. Even with a regex parser, it might be very tricky to do the latter automatically.

Here's the version of /a\d+b/ with the checks added:

# /a\d+b/
while (<DATA>) {
   local our $incomplete;
   my $match = /
      a
      (?:$(?{$incomplete=1})(?!)|(?(?{$incomplete})(?!))
         \d+
         (?:$(?{$incomplete=1})(?!)|(?(?{$incomplete})(?!))
            b
         )
      )
   /x;

   my $rv = $match      ? "match"
          : $incomplete ? "incomplete"
          :               "no match";

   chomp;
   printf("%-10s  %s\n", $_, $rv);
}

__DATA__
a123b
a
a1
a123
a123c
a123ca123b
a123ca123
a123ca123c
[download]

a123b       match
a           incomplete
a1          incomplete
a123        incomplete
a123c       no match
a123ca123b  match
a123ca123   incomplete
a123ca123c  no match
[download]

I recommend that you write a tokenizer and parser. If your language doesn't allow line breaks to happen in the middle of a token, the only time you need to read more data is when you're at the end of the buffer when the parser requests a new token.

my $ws = qr/\s+/;

sub get_token {
   my ($self) = @_;
   for ($self->{buf}) {
      s/^$ws//;
      if (length() == 0) {
         my $fh = $self->{fh};
         return [ TOK_EOF ] if eof($fh);
         $_ .= <$fh>;
         redo;
      }

      s/^([a-zA-Z][a-zA-Z0-9_]*)// && return [ TOK_IDENT, $1 ];
      ...
   }
}
[download]

If some tokens can contain line breaks, handle those cases specially.

Comment on Re: Did regex match fail because of "end of string"? Select or Download Code

Replies are listed 'Best First'.
Re^2: Did regex match fail because of "end of string"? by moritz (Cardinal) on Oct 16, 2007 at 21:23 UTC
I can't rely on the fact the a token won't contain a newline because the user of my (not yet existing) module will decide what a "token" looks like. But since the the regexes will always be anchored I can always find out automatically if a match has started by using `$match = m/\G(?{ $started = 1 })$re/`. Now a way to find the longest submatch that was found (but discarded) would be enough. Or is there any other way to match against a stream? Perl 6 in German	[reply] [d/l]
Re^3: Did regex match fail because of "end of string"? by Illuminatus (Curate) on Oct 16, 2007 at 23:27 UTC
The construct you are showing is not 'anchored'. The only anchor expressions are '^' (beginning of string) and '$' (end of string). If I am understanding correctly, all you really care about are partial matches at the end of the current available string. Partial matches in the middle are already discarded as non-matches. Is there a reason that you cannot simply keep starting from the same location until you receive an end-of-string, or find a match? Can this be more data than you want to hold? If you can't do this, I can think of one (very ugly) option. Something like this: sub example { $foo = "[&#\$]"; $regex = "a\\d+[ars]{2,4}(aa\|ab\|ac)"; $string="wle;fnaekf;fla;lkcnovnifa "; $min = $regex."\$"."foo"; if ($min !~ /\$$/) { $min .= '$'; } $match = 0; $tot = length($string); $index = $tot; print "index is $index\n"; while (1) { print "min is $min\n"; eval { if ($string =~ m/$min/g) { $index = pos $string; $match = 1; } }; # print "err is $@\n"; last if $match; $min =~ s/..$//; last if $min eq ""; if ($min !~ /\$$/) { $min .= '$'; } } return $index; } $ind = example(); [download] You will also have to special-case lines terminated with '\'.	[reply] [d/l]
Re^4: Did regex match fail because of "end of string"? by moritz (Cardinal) on Oct 17, 2007 at 05:45 UTC
Is there a reason that you cannot simply keep starting from the same location until you receive an end-of-string, or find a match? Yes, I don't know if the regex reached the end of the string and failed, in which case I'd have to load more data. Your method seems to be a bit blunt, removing a char blindly from the regex - which leads to many non-valid regexes and big performance penalties. The idea is quite interesting, though ;-)	[reply]
Re^5: Did regex match fail because of "end of string"? by Illuminatus (Curate) on Oct 17, 2007 at 08:08 UTC
Re^6: Did regex match fail because of "end of string"? by moritz (Cardinal) on Oct 17, 2007 at 11:35 UTC
Re^3: Did regex match fail because of "end of string"? by ikegami (Patriarch) on Oct 16, 2007 at 23:03 UTC
`$started` is always set to `1` in your example.	[reply] [d/l] [select]
Re^4: Did regex match fail because of "end of string"? by moritz (Cardinal) on Oct 17, 2007 at 05:39 UTC
Right. I didn't think enough about that one :(	[reply]