(I am using terms "parser", "tokenizer / lexer", "abstract syntax tree
(AST)" in a compiler context.)
I was pointed to the YAPE::Regex module "Yet Another Parser/Extractor
for Regular Expressions". From the documentation and my experiments in
the debugger, I believe it to be only a lexer, not a proper parser.
As far as I can tell, it breaks down the text of a RE into strings of
characters with particular meanings (tokens), but it doesn't assemble
those meanings into a hierarchy (abstract syntax tree).
For a quick look at the AST of a RE, you can see the indented text
in "Debugging regular expressions" of man perldebguts or
perl
-Mre=debug -e '$re = qr/^a(b(cd?)?)?/;'
.
To do one of the transformations that many of us have thought up, I
should be operating on the AST, not just on the individual tokens. Also,
I would have to break each EXACT node into a sequence of single-character
EXACT nodes. |