Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
Here I have a RE which I will call $pattern; it begins with a ^. Over there is a string which I will call $fullstring. I would like to know if $pattern matches $fullstring; that is: $fullstring =~ /$pattern/. The challenge is that I only know $buff, a prefix of $fullstring. I want a programmatic way to answer the following question:
Can I be certain that, no matter what I add to $buff, that $pattern will not match?
Asked again, more formally and in the inverse (I don't care what $remainder is, I just want to know if it's possible for $pattern to match with some oracle-given $remainder.):
Does there exist a string $remainder such that ($buff.$remainder) =~ /$pattern/?
Please ponder the above for a minute before I clutter your mind with the practical bits and my particular ideas for a solution.
Okay, now I'll tell a bit more about the practical side. I am breaking a stream coming over a network into tokens, each token matching one of a set of patterns. I want to be able to eliminate patterns that can't match the next token, so that when I have eliminated all the patterns I recognize that there has been a protocol error. Most of the patterns are quite simple, usually fixed-length and fixed-position:
# simple keyword /^KEYWORD/ # simple keyword with fixed-width integer parameter /^INTVALUE(\d{4})/ # keyword with integer parameter specifying length of binary blob /^CHUNKOFGARBAGE(\d{4})/ and then /^.{$1}/
Ideally, I'd like to get some kind of hook into the RE engine, such that when the engine reaches the end of $buff, rather than backtracking, it should immediately report a (potential) match. Alas, nothing in man pages, my Nutshell books, CPAN, Usenet searches, or Perl Monks has hinted at a way to call my code from within the RE engine (In particular, yes, I've read man overload and "Creating custom RE engines" in man perlre.)
Is there a way to programmatically transform $pattern into another RE that I can apply against $buff and get an answer to my formal question?
One way to transform $pattern would be to replace every node (named $node) of its AST with ( $ | $node ). After such a transformation, the RE engine would claim a match whenever it hit the end of $buff. I can't think of an elegant way to perform such a transformation without making my own RE parser.
Can I do something with $^R, (?{ code }), and/or (??{ code })?
The important thing for me right now is that it be correct. Once I get that, I'll worry about making it work efficiently.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Matching against a partially known string (link)
by tye (Sage) on Sep 19, 2003 at 18:53 UTC | |
by wonkozen (Initiate) on Sep 19, 2003 at 21:30 UTC | |
by wonkozen (Initiate) on Oct 10, 2003 at 01:22 UTC | |
|
Re: Matching against a partially known string
by sandfly (Beadle) on Sep 19, 2003 at 21:58 UTC | |
by wonkozen (Initiate) on Sep 20, 2003 at 15:51 UTC | |
|
Re: Matching against a partially known string
by kvale (Monsignor) on Sep 19, 2003 at 18:33 UTC | |
|
Re: Matching against a partially known string
by wonkozen (Initiate) on Sep 19, 2003 at 18:05 UTC | |
by wonkozen (Initiate) on Sep 19, 2003 at 22:17 UTC |