Re^2: Cannot get Marpa::R2 to prioritise one rule over another

The dot is ignored because the default action (::first) is used for the DOMAIN rule. Mixing the lexer and grammar rules is not a good idea, they're very different. Using consistent capitalization for the non-terminals also helps, I usually use a different rule for the grammar and lexer ones.

I usually build the grammar from the top to the bottom, i.e. from the starting symbol to the L0 rules. I start with the default action of [name,values] and replace it with individual actions from the bottom to the top.

The result might be something like

#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };

use Data::Dump;
use Marpa::R2;

my $dsl = <<'__DSL__';
lexeme default = latm => 1
:default ::= action => ::first

Entry      ::= op Hostaddr4        action => entry
Hostaddr4  ::= Hostname            action => add_hostname
             | Ipv4
Hostname   ::= Domains '.' ext     action => concat
Domains    ::= domain '.' Domains  action => concat
             | domain
Ipv4       ::= number ('.') number ('.') number ('.') number
                                   action => add_ip

op           ~ 'add' | 'remove'
domain       ~ [\d\w]+
ext          ~ 'org' | 'net'
number       ~ [\d]+
:discard     ~ whitespace
whitespace   ~ [\s]+


__DSL__

my $input = << '__INPUT__';
add example.org
add www.perl.org
add 42.perl.net
add 192.0.2.1
remove 192.0.2.2

__INPUT__

my $grammar = 'Marpa::R2::Scanless::G'->new({source => \$dsl});

open my $in, '<', \$input;
while (<$in>) {
    chomp;
    next unless length;
    say "PARSING: $_";
    my $recce = 'Marpa::R2::Scanless::R'->new({
        grammar           => $grammar,
        semantics_package => 'main',
    });
    # Uncomment for debugging:
    # warn $recce->show_progress(0);
    $recce->read(\$_);
    dd ${ $recce->value };
}

sub add_ip       { { type => 'ip', ip => join '.', @_[1 .. $#_] } }
sub concat       { shift; join "", @_ }
sub add_hostname { { type => 'hostname', hostname => $_[1] } }
sub entry        { { operator => $_[1], %{ $_[2] } } }

__DATA__
PARSING: add example.org
{ hostname => "example.org", operator => "add", type => "hostname" }
PARSING: add www.perl.org
{ hostname => "www.perl.org", operator => "add", type => "hostname" }
PARSING: add 42.perl.net
{ hostname => "42.perl.net", operator => "add", type => "hostname" }
PARSING: add 192.0.2.1
{ ip => "192.0.2.1", operator => "add", type => "ip" }
PARSING: remove 192.0.2.2
{ ip => "192.0.2.2", operator => "remove", type => "ip" }
[download]

map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

Comment on Re^2: Cannot get Marpa::R2 to prioritise one rule over another Select or Download Code

Replies are listed 'Best First'.
Re^3: Cannot get Marpa::R2 to prioritise one rule over another by Discipulus (Canon) on Jan 22, 2021 at 09:02 UTC
Hello choroba, can you be so kind to explain me better your: > Mixing the lexer and grammar rules is not a good idea, they're very different. because I'm reading Marpa-R2 vocabulary and I am not able to strictly define them. Where my code mixes them? L* There are no rules, there are no thumbs.. Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.	[reply] [d/l]
Re^4: Cannot get Marpa::R2 to prioritise one rule over another by choroba (Cardinal) on Jan 22, 2021 at 09:14 UTC
Lines 24 and 25 contain lexer rules (easily recognisable by the tilde), but line 27 contains a grammar rule again, followed by another lexer rule. But maybe that's how you like it. The more important question is whether you know what the difference between them is; separating them visually helped me to keep them separated in my head, too. `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l]
Re^3: Cannot get Marpa::R2 to prioritise one rule over another by Anonymous Monk on Jan 21, 2021 at 21:07 UTC
Thanks for demonstrating how to recompose the dotted components of hostnames and IPs, using a custom action. I had been wondering how best to go about that, and you have given me a starting point. One question, regarding your `concat` subroutine, if I may: Is it possible to generalise it to return the `[rulename,concatted-string]` pair, so it conforms to the tokens emitted by the default action `[name,values]`, or would I have to have a separate subroutine for each rule (and return the rulename literally)? I had originally thought there might be context in first argument, which you `shift` over, but that appears to be an empty hashref in all cases I've seen.	[reply] [d/l] [select]
Re^4: Cannot get Marpa::R2 to prioritise one rule over another by choroba (Cardinal) on Jan 21, 2021 at 21:14 UTC
The first argument is there for you, you can store whatever you want in it. But if you can build the result just by composition, I don't see a reason to use it. AFAIK, there aren't many predefined actions (::first, [name,values]). Concatenation is definitely not a universal thing, you typically propagate structures, not strings. `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l]
Re^5: Cannot get Marpa::R2 to prioritise one rule over another by Anonymous Monk on Jan 21, 2021 at 21:56 UTC
I might not be explaining myself properly... In the contrived below example, the default action is `[name,values]`, as you recommend: #!/usr/bin/env perl use warnings; use strict; use Data::Dumper::Concise; use Term::ANSIColor qw(:constants); use Marpa::R2; package main; my $rules = <<'END_OF_GRAMMAR'; lexeme default = latm => 1 :default ::= action => [name,values] :start ::= <entry> <entry> ::= 'foo' (SP) <hostaddr4> \| 'bar' (SP) <hostaddr4> \| 'baz' (SP) <hostaddr4> <ipv4> ::= NUMBER ('.') NUMBER ('.') NUMBER ('.') NUMBER <hostname> ::= NAMECH+ separator => DOT <hostaddr4> ::= <ipv4> \| <hostname> SP ~ [\s]+ DOT ~ '.' NAMECH ~ [^\s.:]+ NUMBER ~ [\d]+ END_OF_GRAMMAR my $input = <<'END_OF_INPUT'; foo 192.0.2.1 foo www.example.org bar 192.0.2.2 bar 3.2.0.192.in-arpa.net baz 192.0.2 baz flibble.example.com END_OF_INPUT my $grammar = Marpa::R2::Scanless::G->new({source => \$rules}); for (split /^/m, $input) { chomp; if (length $_) { print "\n\n$_\n"; my $recce = Marpa::R2::Scanless::R->new({grammar => $grammar, +ranking_method => 'rule', semantics_package => 'main'}); eval { $recce->read(\$_ ) }; print(($@ ? (RED . "$@\n") : GREEN), $recce->show_progress(), "\n", Dumper($recce->value), "\n\n", RESET); } } [download] This results in a parse result that makes it clear whether the `hostaddr4` component is an `ipv4` or a `hostname`, by the first element that gets pushed onto the array, but requires that I later recompose both IPv4 addresses and hostnames, e.g. by `shift; join '.', @_`: `\[ "entry", "foo", [ "hostaddr4", [ "ipv4", 192, 0, 2, 1, ], ], ] ... \[ "entry", "foo", [ "hostaddr4", [ "hostname", "www", "example", "org", ], ], ] ...` [download] If I update the grammar slightly, to use a custom action to recompose these for me: `<ipv4> ::= NUMBER ('.') NUMBER ('.') NUMBER ('.') NUMBER + action => joindot <hostname> ::= NAMECH+ separator => DOT + action => joindot ... sub joindot { shift, join '.', @_ }` [download] ...the resulting parse structure no longer has the `ipv4` or `hostname` indicators, and there appears not to be enough information in the arguments passed to `joindot` for it to return it: `\[ "entry", "foo", [ "hostaddr4", "192.0.2.1", ], ] ... \[ "entry", "foo", [ "hostaddr4", "www.example.org", ], ]` [download] The only approach I can see to do so is to define the grammar with two near-identical function, one that emits 'ipv4' and one that emits 'hostname': `<ipv4> ::= NUMBER ('.') NUMBER ('.') NUMBER ('.') NUMBER + action => joinipv4 <hostname> ::= NAMECH+ separator => DOT + action => joinhostname ... sub joindot { join '.', @_ } sub joinipv4 { shift, ["ipv4", (joindot @_)] } sub joinhostname { shift, ["hostname", (joindot @_)] }` [download] But, in a larger grammar, that repetition is a pain, especially when the information is already known to the parser (and can be emitted automatically where the action is non-custom). Am I missing a trick here?	[reply] [d/l] [select]
Re^6: Cannot get Marpa::R2 to prioritise one rule over another by choroba (Cardinal) on Jan 21, 2021 at 22:17 UTC


Syntactic Confectionery Delight
	PerlMonks