Re: HOP::Lexer not doing what I expected

OK guys, it's getting to look worse all the time. I found a much simpler example of something that I think is going terribly wrong, and I'd like you to chew it over.

use HOP::Lexer 'string_lexer';
my $text = 'xselectx';
my $lexer = string_lexer( $text,
  [KEYWORD  => qr/select/i],
  [WORD     => qr/\w+/    ]
);
[download]

(n.b. string_lexer is just a routine in the module that wraps the input string in an iterator, and then calls make_lexer, so we don't have to do it by hand. The code we have to write just becomes a bit simpler.)

Tell me that the result it parses into is what you think makes sense. Because it doesn't make any sense to me at all:

['WORD','x'],
['KEYWORD','select'],
['WORD','x']
[download]

This is just so messed up.

update I just read cmarcelo's reply... You want me to swap the rules? OK...

use HOP::Lexer 'string_lexer';
my $text = 'select xselectx';
my $lexer = string_lexer( $text,
  [WORD     => qr/\w+/    ],
  [KEYWORD  => qr/select/i],
);
[download]

Outcome:

['WORD','select'],
' ',
['WORD','xselectx']
[download]

No good.

Comment on Re: HOP::Lexer not doing what I expected Select or Download Code

Replies are listed 'Best First'.
Re^2: HOP::Lexer not doing what I expected by cmarcelo (Scribe) on Nov 11, 2006 at 22:11 UTC
Sorry, but I don't get what's the problem. `[KEYWORD => qr/select/i], [WORD => qr/\w+/ ],` [download] What were you expecting exactly to have as result for the rules above for the string `xselectx`? Are you expecting to deal with word boundaries, like not matching `KEYWORD` only when it's separated by spaces or something, so doesn't match `xselectx`? And according to my explanation, this is the right order, since `WORD` matches whatever `KEYWORD` matches, but `KEYWORD` is more specific, so goes up.	[reply] [d/l] [select]
Re^3: HOP::Lexer not doing what I expected by bart (Canon) on Nov 11, 2006 at 22:23 UTC
Word boundaries? Hmm... interesting take. It's not something that's been mentioned in the docs, or in the perl.com article. Where it really does go wrong, in my opinion, is that it doesn't make any attempt to try and find a leftmost match. That's what all lexers are supposed to do. So you can rightfully argue that it must find "select" in the string "selectx", it makes no sense to skip the first "x" in "xselectx". No other lexer or parser in the world would do that, not by design.	[reply]
Re^4: HOP::Lexer not doing what I expected by cmarcelo (Scribe) on Nov 11, 2006 at 22:39 UTC
The question of word boundaries doesn't show up because the example author uses doesn't need it. So all works fine (at least in the article). But in your example that makes a difference. I know very little about lexers, but I agree that using `split` causes unexpected behavior (not matching the leftmost rule), but has proven useful in the example of the article, where it creates rules only for what matters (ignoring the `=` symbol, for example). I don't know how hard/easy would be to do that for leftmost rule matching. `split` use here is convenient. And note, I didn't tell that `x` must be skipped (considered garbage), at least considering the rules I mentioned, but it's matched by `WORD`, then `KEYWORD` matches `select`. HOP::Lexer knows nothing about boundaries, neither give special meaning to `\s`, you must tell him if you want just match `select` in `" select "` or `" select, "` but not in `"selectx"`.	[reply] [d/l] [select]