comment on

Here I have a RE which I will call $pattern; it begins with a ^. Over there is a string which I will call $fullstring. I would like to know if $pattern matches $fullstring; that is: $fullstring =~ /$pattern/. The challenge is that I only know $buff, a prefix of $fullstring. I want a programmatic way to answer the following question:

Can I be certain that, no matter what I add to $buff, that $pattern will not match?

Asked again, more formally and in the inverse (I don't care what $remainder is, I just want to know if it's possible for $pattern to match with some oracle-given $remainder.):

Does there exist a string $remainder such that ($buff.$remainder) =~ /$pattern/?

Please ponder the above for a minute before I clutter your mind with the practical bits and my particular ideas for a solution.

Okay, now I'll tell a bit more about the practical side. I am breaking a stream coming over a network into tokens, each token matching one of a set of patterns. I want to be able to eliminate patterns that can't match the next token, so that when I have eliminated all the patterns I recognize that there has been a protocol error. Most of the patterns are quite simple, usually fixed-length and fixed-position:

# simple keyword
/^KEYWORD/

# simple keyword with fixed-width integer parameter
/^INTVALUE(\d{4})/

# keyword with integer parameter specifying length of binary blob
/^CHUNKOFGARBAGE(\d{4})/ and then /^.{$1}/
[download]

Ideally, I'd like to get some kind of hook into the RE engine, such that when the engine reaches the end of $buff, rather than backtracking, it should immediately report a (potential) match. Alas, nothing in man pages, my Nutshell books, CPAN, Usenet searches, or Perl Monks has hinted at a way to call my code from within the RE engine (In particular, yes, I've read man overload and "Creating custom RE engines" in man perlre.)

Is there a way to programmatically transform $pattern into another RE that I can apply against $buff and get an answer to my formal question?

One way to transform $pattern would be to replace every node (named $node) of its AST with ( $ | $node ). After such a transformation, the RE engine would claim a match whenever it hit the end of $buff. I can't think of an elegant way to perform such a transformation without making my own RE parser.

Can I do something with $^R, (?{ code }), and/or (??{ code })?

The important thing for me right now is that it be correct. Once I get that, I'll worry about making it work efficiently.

In reply to Matching against a partially known string by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.