in reply to Matching against a partially known string

Whoops, this silly new user posted anonymously, even after he created an account just to ask this question. Oh well, maybe he'll learn something else soon.
  • Comment on Re: Matching against a partially known string

Replies are listed 'Best First'.
Re: Re: Matching against a partially known string
by wonkozen (Initiate) on Sep 19, 2003 at 22:17 UTC

    (I am using terms "parser", "tokenizer / lexer", "abstract syntax tree (AST)" in a compiler context.)

    I was pointed to the YAPE::Regex module "Yet Another Parser/Extractor for Regular Expressions". From the documentation and my experiments in the debugger, I believe it to be only a lexer, not a proper parser. As far as I can tell, it breaks down the text of a RE into strings of characters with particular meanings (tokens), but it doesn't assemble those meanings into a hierarchy (abstract syntax tree).

    For a quick look at the AST of a RE, you can see the indented text in "Debugging regular expressions" of man perldebguts or

    perl -Mre=debug -e '$re = qr/^a(b(cd?)?)?/;'
    .

    To do one of the transformations that many of us have thought up, I should be operating on the AST, not just on the individual tokens. Also, I would have to break each EXACT node into a sequence of single-character EXACT nodes.