in reply to Re: HOP::Lexer not doing what I expected
in thread HOP::Lexer not doing what I expected

Sorry, but I don't get what's the problem.
[KEYWORD => qr/select/i], [WORD => qr/\w+/ ],

What were you expecting exactly to have as result for the rules above for the string xselectx? Are you expecting to deal with word boundaries, like not matching KEYWORD only when it's separated by spaces or something, so doesn't match xselectx?

And according to my explanation, this is the right order, since WORD matches whatever KEYWORD matches, but KEYWORD is more specific, so goes up.

Replies are listed 'Best First'.
Re^3: HOP::Lexer not doing what I expected
by bart (Canon) on Nov 11, 2006 at 22:23 UTC
    Word boundaries? Hmm... interesting take. It's not something that's been mentioned in the docs, or in the perl.com article.

    Where it really does go wrong, in my opinion, is that it doesn't make any attempt to try and find a leftmost match. That's what all lexers are supposed to do. So you can rightfully argue that it must find "select" in the string "selectx", it makes no sense to skip the first "x" in "xselectx". No other lexer or parser in the world would do that, not by design.

      The question of word boundaries doesn't show up because the example author uses doesn't need it. So all works fine (at least in the article). But in your example that makes a difference.

      I know very little about lexers, but I agree that using split causes unexpected behavior (not matching the leftmost rule), but has proven useful in the example of the article, where it creates rules only for what matters (ignoring the = symbol, for example). I don't know how hard/easy would be to do that for leftmost rule matching. split use here is convenient.

      And note, I didn't tell that x must be skipped (considered garbage), at least considering the rules I mentioned, but it's matched by WORD, then KEYWORD matches select. HOP::Lexer knows nothing about boundaries, neither give special meaning to \s, you must tell him if you want just match select in " select " or " select, " but not in "selectx".