in reply to Cannot get Marpa::R2 to prioritise one rule over another
Hello again,
maybe a second attempt is better than first one. I had to specify what an hostname is in an ugly way but seems viable.
I'm going mad to understand why the dot . is passed in for IPs and not for hostnames! (because ip ends with an action?)
#!/usr/bin/env perl
use warnings;
use strict;
use Data::Dump;
use Marpa::R2;
my $rules = <<'END_OF_GRAMMAR';
lexeme default = latm => 1
:default ::= action => ::first
entry ::= op hostaddr4 action => dump_entry
op ::= 'add' action => add_op
| 'remove' action => add_op
hostaddr4 ::= hostname | ipv4
hostname ::= DOMAIN EXT action => add_hostname
| DOMAIN DOMAIN EXT action => add_hostname
| DOMAIN DOMAIN DOMAIN EXT action => add_hostname
DOMAIN ::= NAME '.'
NAME ~ [\d\w]+
EXT ~ 'org' | 'net'
ipv4 ::= NUMBER '.' NUMBER '.' NUMBER '.' NUMBER action =>
+add_ip
NUMBER ~ [\d]+
:discard ~ SP
SP ~ [\s]+
END_OF_GRAMMAR
my $input = <<'END_OF_INPUT';
add example.org
add www.perl.org
add 42.perl.net
add 192.0.2.1
remove 192.0.2.2
END_OF_INPUT
my $grammar = Marpa::R2::Scanless::G->new({source => \$rules});
for (split /^/m, $input) {
chomp;
if (length $_) {
print "\nPARSING: $_\n";
my $recce = Marpa::R2::Scanless::R->new({
grammar => $grammar,
});
my $value_ref = $grammar->parse( \$_, 'main');
}
}
sub dump_entry{
print "dump_entry received: "; dd shift @_;
}
sub add_op{
my $self = shift @_;
print "add_op received: "; dd @_;
$$self{operator} = join '',@_;
return $self;
}
sub add_ip{
my $self = shift @_;
print "add_ip received: "; dd @_;
$$self{type} = 'IP';
$$self{value} = join '',@_;
return $self;
}
sub add_hostname{
my $self = shift @_;
print "add_hostname received: "; dd @_;
$$self{type} = 'hostname';
$$self{value} = join '.',@_;
return $self;
}
__DATA__
PARSING: add example.org
add_op received: "add"
add_hostname received: ("example", "org")
dump_entry received: { operator => "add", type => "hostname", value =>
+ "example.org" }
PARSING: add www.perl.org
add_op received: "add"
add_hostname received: ("www", "perl", "org")
dump_entry received: { operator => "add", type => "hostname", value =>
+ "www.perl.org" }
PARSING: add 42.perl.net
add_op received: "add"
add_hostname received: (42, "perl", "net")
dump_entry received: { operator => "add", type => "hostname", value =>
+ "42.perl.net" }
PARSING: add 192.0.2.1
add_op received: "add"
add_ip received: (192, ".", 0, ".", 2, ".", 1)
dump_entry received: { operator => "add", type => "IP", value => "192.
+0.2.1" }
PARSING: remove 192.0.2.2
add_op received: "remove"
add_ip received: (192, ".", 0, ".", 2, ".", 2)
dump_entry received: { operator => "remove", type => "IP", value => "1
+92.0.2.2" }
L*
There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re^2: Cannot get Marpa::R2 to prioritise one rule over another
by choroba (Cardinal) on Jan 21, 2021 at 17:16 UTC
|
The dot is ignored because the default action (::first) is used for the DOMAIN rule. Mixing the lexer and grammar rules is not a good idea, they're very different. Using consistent capitalization for the non-terminals also helps, I usually use a different rule for the grammar and lexer ones.
I usually build the grammar from the top to the bottom, i.e. from the starting symbol to the L0 rules. I start with the default action of [name,values] and replace it with individual actions from the bottom to the top.
The result might be something like
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
| [reply] [d/l] [select] |
|
Hello choroba,
can you be so kind to explain me better your:
> Mixing the lexer and grammar rules is not a good idea, they're very different.
because I'm reading Marpa-R2 vocabulary and I am not able to strictly define them. Where my code mixes them?
L*
There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
| [reply] [d/l] |
|
Lines 24 and 25 contain lexer rules (easily recognisable by the tilde), but line 27 contains a grammar rule again, followed by another lexer rule. But maybe that's how you like it. The more important question is whether you know what the difference between them is; separating them visually helped me to keep them separated in my head, too.
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
| [reply] [d/l] |
|
Thanks for demonstrating how to recompose the dotted components of hostnames and IPs, using a custom action. I had been wondering how best to go about that, and you have given me a starting point.
One question, regarding your concat subroutine, if I may: Is it possible to generalise it to return the [rulename,concatted-string] pair, so it conforms to the tokens emitted by the default action [name,values], or would I have to have a separate subroutine for each rule (and return the rulename literally)?
I had originally thought there might be context in first argument, which you shift over, but that appears to be an empty hashref in all cases I've seen.
| [reply] [d/l] [select] |
|
The first argument is there for you, you can store whatever you want in it. But if you can build the result just by composition, I don't see a reason to use it.
AFAIK, there aren't many predefined actions (::first, [name,values]). Concatenation is definitely not a universal thing, you typically propagate structures, not strings.
map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
| [reply] [d/l] |
|
|
Re^2: Cannot get Marpa::R2 to prioritise one rule over another
by Anonymous Monk on Jan 21, 2021 at 20:55 UTC
|
Thanks for this attempt, but I'm not sure that defining hostname as a fixed number of DOMAIN components, nor defining a limited set of EXT suffixes is the right way to go. Hostnames can be arbitrarily long, at least in terms of subdomains, and the list of top-level domains is growing by the day.
I'm probably going to settle just capturing NAME and laying off the semantics of IPv4, (later) IPv6, and neither of those to a custom action. Given the complexity of the problem (esp. IPv6), that is likely the best way forward.
J.
| [reply] [d/l] [select] |
|
|