comment on

Another idea I have been toying with is the idea of lexing the source into an array of tokens. Then, compiling the token list into a string using a single character to represent each token type. I then could regex against the string (effectivly regexing for token patterns). Then use the indexes found in @- and @+ to return the tokens represented by the captured groups. I wrote a nice object-oriented version, complete with operator overloading, but here is the main idea of the code:

@tokens = ...;
$tokens = join '', map { $tokenToChar{ $_->type } }, @tokens;

if ($tokens =~ / ($IDENT)($DOT $IDENT)* /x) {
     my @captured = map { [ @tokens[ $-[$_], $+[$_] ] } 1..$#-;
     my ($head, $tail) = @catpured;
}
[download]

This worked beautifully. Since a three to four hundered character QL string could be represented by, maybe, 80 tokens, the token string is very small. All the regexes are on single chars so backtracking is minimized. And I could use character classes [$TOK1 $TOK2] to simulate token classes. All in all, it seems like a very efficient approach. I may even be able to apply this technique to P::RD.

Thank you rhesa. I will read through the P::RD FAQ on optimization!

And thanks for all of the feedback so far.

Ted Young

($$<<$$=>$$<=>$$<=$$>>$$) always returns 1. :-)

In reply to Re: Object Query Languages and Parsers by TedYoung
in thread Object Query Languages and Parsers by TedYoung

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.