in reply to Did regex match fail because of "end of string"?
There's no easy way to do this. You could modify the regex engine, or you could modify your regex to check for the appropriate conditions. Even with a regex parser, it might be very tricky to do the latter automatically.
Here's the version of /a\d+b/ with the checks added:
# /a\d+b/ while (<DATA>) { local our $incomplete; my $match = / a (?:$(?{$incomplete=1})(?!)|(?(?{$incomplete})(?!)) \d+ (?:$(?{$incomplete=1})(?!)|(?(?{$incomplete})(?!)) b ) ) /x; my $rv = $match ? "match" : $incomplete ? "incomplete" : "no match"; chomp; printf("%-10s %s\n", $_, $rv); } __DATA__ a123b a a1 a123 a123c a123ca123b a123ca123 a123ca123c
a123b match a incomplete a1 incomplete a123 incomplete a123c no match a123ca123b match a123ca123 incomplete a123ca123c no match
I recommend that you write a tokenizer and parser. If your language doesn't allow line breaks to happen in the middle of a token, the only time you need to read more data is when you're at the end of the buffer when the parser requests a new token.
my $ws = qr/\s+/; sub get_token { my ($self) = @_; for ($self->{buf}) { s/^$ws//; if (length() == 0) { my $fh = $self->{fh}; return [ TOK_EOF ] if eof($fh); $_ .= <$fh>; redo; } s/^([a-zA-Z][a-zA-Z0-9_]*)// && return [ TOK_IDENT, $1 ]; ... } }
If some tokens can contain line breaks, handle those cases specially.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Did regex match fail because of "end of string"?
by moritz (Cardinal) on Oct 16, 2007 at 21:23 UTC | |
by Illuminatus (Curate) on Oct 16, 2007 at 23:27 UTC | |
by moritz (Cardinal) on Oct 17, 2007 at 05:45 UTC | |
by Illuminatus (Curate) on Oct 17, 2007 at 08:08 UTC | |
by moritz (Cardinal) on Oct 17, 2007 at 11:35 UTC | |
by ikegami (Patriarch) on Oct 16, 2007 at 23:03 UTC | |
by moritz (Cardinal) on Oct 17, 2007 at 05:39 UTC |