I have difficulty understanding the article. In my humble opinion it is rather theoretical. Maybe I am not worthy ;-)
No, it probably means that you are not the intended target audience. Or that I did a bad job at writing.
The "slightly" is an understatement. Some of my nightmare examples to illustrate the point:
You're right, I underestimated the complexity. I thought you could just take lines, and multiple lines if they contained non-closed quoted strings.
Still you should not give up hope. I wrote a simple lexer that works for the example you gave:
use strict;
use warnings;
use Data::Dumper;
use Math::Expression::Evaluator::Lexer qw(lex);
my $d = do { local $/; <DATA> };
my @tokens = (
['Commment', qr{/\*.*?\*/}s, sub { return }],
['Identifier', qr{[a-zA-Z_]\w+}],
['Number', qr{\d+}],
['Operator', qr{[=(),+-/*{}]}],
['Quoted String', qr{"[^"]*"}],
['Newline', qr{\n}],
['Whitespace', qr{\s+}, sub { return }],
);
print Dumper lex($d, \@tokens);
__DATA__
/* A 2-dimensional sequence as the value is being called in ODL */
KEYWORD = ((1,2) (3,4) (5,8) /* some comment */
9,11))
/* A set as the value is being called in ODL */
KEYWORD = { RED, BLUE, /* some comment */
GREEN, HAZEL }
/* A text string spanning multiple lines */
KEYWORD = "some text /* not a comment but part of the value! */
more text
even more text" /* this is again a comment*/
This is far from ideal, but it does tokenize the data in a meaningful way, and strips comments, but not those inside quoted strings.
(The lexer in Math::Expression::Evaluator::Lexer is quite simple and not iterator-like. If you don't want to read all input at once, you need to come up with something more sophisticated.) |