Re: Object Query Languages and Parsers

Another idea I have been toying with is the idea of lexing the source into an array of tokens. Then, compiling the token list into a string using a single character to represent each token type. I then could regex against the string (effectivly regexing for token patterns). Then use the indexes found in @- and @+ to return the tokens represented by the captured groups. I wrote a nice object-oriented version, complete with operator overloading, but here is the main idea of the code:

@tokens = ...;
$tokens = join '', map { $tokenToChar{ $_->type } }, @tokens;

if ($tokens =~ / ($IDENT)($DOT $IDENT)* /x) {
     my @captured = map { [ @tokens[ $-[$_], $+[$_] ] } 1..$#-;
     my ($head, $tail) = @catpured;
}
[download]

This worked beautifully. Since a three to four hundered character QL string could be represented by, maybe, 80 tokens, the token string is very small. All the regexes are on single chars so backtracking is minimized. And I could use character classes [$TOK1 $TOK2] to simulate token classes. All in all, it seems like a very efficient approach. I may even be able to apply this technique to P::RD.

Thank you rhesa. I will read through the P::RD FAQ on optimization!

And thanks for all of the feedback so far.

Ted Young

($$<<$$=>$$<=>$$<=$$>>$$) always returns 1. :-)

Comment on Re: Object Query Languages and Parsers Select or Download Code

Replies are listed 'Best First'.
Re^2: Object Query Languages and Parsers by Gilimanjaro (Hermit) on Apr 23, 2006 at 23:13 UTC
Before optimizing, caching and tokenizing, I'd advise on benchmarking to see if it's worth the trouble (and obfuscation)... You mentioned you're running your request through CGI, which would imply there's a whole bunch of overhead going on elsewhere, which may well slow you down more then P::RD itself... I'm not saying you shouldn't optimize, cache or tokenize, but I'm saying you may want to make sure it would make a notable difference in performance in your production enviroment. As for how to go about optimizing, caching or tokenizing if you do choose to do it: the other monks seem to have covered all that quite sufficiently... ;)	[reply]

Replies are listed 'Best First'.

Re^2: Object Query Languages and Parsers
by Gilimanjaro (Hermit) on Apr 23, 2006 at 23:13 UTC

Before optimizing, caching and tokenizing, I'd advise on benchmarking to see if it's worth the trouble (and obfuscation)...

You mentioned you're running your request through CGI, which would imply there's a whole bunch of overhead going on elsewhere, which may well slow you down more then P::RD itself...

I'm not saying you shouldn't optimize, cache or tokenize, but I'm saying you may want to make sure it would make a notable difference in performance in your production enviroment.

As for how to go about optimizing, caching or tokenizing if you do choose to do it: the other monks seem to have covered all that quite sufficiently... ;)

[reply]