in reply to Basics of parsing (using RTF as a testbed)
Here's a tokenizer for your language as you've defined it so far:
my $text = "{{\\escape\\sequences \\more\\sequences{\\yet\\more}\\agai +n\\some\\more\\sequences Some Data}{\\foo\\bar Some Other Data}}"; printf("%-14s %s\n", 'Token Type', 'Token Value'); printf("%-14s %s\n", '='x14, '='x40); foreach ($text) { m/\G( { )/gcx && do { printf("%-14s %s\n", 'curly, opening' +, $1); redo; }; m/\G( } )/gcx && do { printf("%-14s %s\n", 'curly, closing' +, $1); redo; }; m/\G( \\\w+ )/gcx && do { printf("%-14s %s\n", 'escape', + $1); redo; }; m/\G( [^{}\\]+ )/gcx && do { printf("%-14s %s\n", 'text', "\" +$1\""); redo; }; }
and I can forsee adding more to further subdivide the tokens.
By definition, a token is something that can't be further subdivided.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Basics of parsing (using RTF as a testbed)
by Mugatu (Monk) on Feb 25, 2005 at 19:21 UTC |