Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re^2: Cannot get Marpa::R2 to prioritise one rule over another

by choroba (Cardinal)
on Jan 21, 2021 at 17:16 UTC ( [id://11127216]=note: print w/replies, xml ) Need Help??


in reply to Re: Cannot get Marpa::R2 to prioritise one rule over another
in thread Cannot get Marpa::R2 to prioritise one rule over another

The dot is ignored because the default action (::first) is used for the DOMAIN rule. Mixing the lexer and grammar rules is not a good idea, they're very different. Using consistent capitalization for the non-terminals also helps, I usually use a different rule for the grammar and lexer ones.

I usually build the grammar from the top to the bottom, i.e. from the starting symbol to the L0 rules. I start with the default action of [name,values] and replace it with individual actions from the bottom to the top.

The result might be something like

#!/usr/bin/perl use warnings; use strict; use feature qw{ say }; use Data::Dump; use Marpa::R2; my $dsl = <<'__DSL__'; lexeme default = latm => 1 :default ::= action => ::first Entry ::= op Hostaddr4 action => entry Hostaddr4 ::= Hostname action => add_hostname | Ipv4 Hostname ::= Domains '.' ext action => concat Domains ::= domain '.' Domains action => concat | domain Ipv4 ::= number ('.') number ('.') number ('.') number action => add_ip op ~ 'add' | 'remove' domain ~ [\d\w]+ ext ~ 'org' | 'net' number ~ [\d]+ :discard ~ whitespace whitespace ~ [\s]+ __DSL__ my $input = << '__INPUT__'; add example.org add www.perl.org add 42.perl.net add 192.0.2.1 remove 192.0.2.2 __INPUT__ my $grammar = 'Marpa::R2::Scanless::G'->new({source => \$dsl}); open my $in, '<', \$input; while (<$in>) { chomp; next unless length; say "PARSING: $_"; my $recce = 'Marpa::R2::Scanless::R'->new({ grammar => $grammar, semantics_package => 'main', }); # Uncomment for debugging: # warn $recce->show_progress(0); $recce->read(\$_); dd ${ $recce->value }; } sub add_ip { { type => 'ip', ip => join '.', @_[1 .. $#_] } } sub concat { shift; join "", @_ } sub add_hostname { { type => 'hostname', hostname => $_[1] } } sub entry { { operator => $_[1], %{ $_[2] } } } __DATA__ PARSING: add example.org { hostname => "example.org", operator => "add", type => "hostname" } PARSING: add www.perl.org { hostname => "www.perl.org", operator => "add", type => "hostname" } PARSING: add 42.perl.net { hostname => "42.perl.net", operator => "add", type => "hostname" } PARSING: add 192.0.2.1 { ip => "192.0.2.1", operator => "add", type => "ip" } PARSING: remove 192.0.2.2 { ip => "192.0.2.2", operator => "remove", type => "ip" }

map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

Replies are listed 'Best First'.
Re^3: Cannot get Marpa::R2 to prioritise one rule over another
by Discipulus (Canon) on Jan 22, 2021 at 09:02 UTC
    Hello choroba,

    can you be so kind to explain me better your:

    > Mixing the lexer and grammar rules is not a good idea, they're very different.

    because I'm reading Marpa-R2 vocabulary and I am not able to strictly define them. Where my code mixes them?

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
      Lines 24 and 25 contain lexer rules (easily recognisable by the tilde), but line 27 contains a grammar rule again, followed by another lexer rule. But maybe that's how you like it. The more important question is whether you know what the difference between them is; separating them visually helped me to keep them separated in my head, too.

      map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re^3: Cannot get Marpa::R2 to prioritise one rule over another
by Anonymous Monk on Jan 21, 2021 at 21:07 UTC

    Thanks for demonstrating how to recompose the dotted components of hostnames and IPs, using a custom action. I had been wondering how best to go about that, and you have given me a starting point.

    One question, regarding your concat subroutine, if I may: Is it possible to generalise it to return the [rulename,concatted-string] pair, so it conforms to the tokens emitted by the default action [name,values], or would I have to have a separate subroutine for each rule (and return the rulename literally)?

    I had originally thought there might be context in first argument, which you shift over, but that appears to be an empty hashref in all cases I've seen.

      The first argument is there for you, you can store whatever you want in it. But if you can build the result just by composition, I don't see a reason to use it.

      AFAIK, there aren't many predefined actions (::first, [name,values]). Concatenation is definitely not a universal thing, you typically propagate structures, not strings.

      map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

        I might not be explaining myself properly...

        In the contrived below example, the default action is [name,values], as you recommend:

        #!/usr/bin/env perl use warnings; use strict; use Data::Dumper::Concise; use Term::ANSIColor qw(:constants); use Marpa::R2; package main; my $rules = <<'END_OF_GRAMMAR'; lexeme default = latm => 1 :default ::= action => [name,values] :start ::= <entry> <entry> ::= 'foo' (SP) <hostaddr4> | 'bar' (SP) <hostaddr4> | 'baz' (SP) <hostaddr4> <ipv4> ::= NUMBER ('.') NUMBER ('.') NUMBER ('.') NUMBER <hostname> ::= NAMECH+ separator => DOT <hostaddr4> ::= <ipv4> | <hostname> SP ~ [\s]+ DOT ~ '.' NAMECH ~ [^\s.:]+ NUMBER ~ [\d]+ END_OF_GRAMMAR my $input = <<'END_OF_INPUT'; foo 192.0.2.1 foo www.example.org bar 192.0.2.2 bar 3.2.0.192.in-arpa.net baz 192.0.2 baz flibble.example.com END_OF_INPUT my $grammar = Marpa::R2::Scanless::G->new({source => \$rules}); for (split /^/m, $input) { chomp; if (length $_) { print "\n\n$_\n"; my $recce = Marpa::R2::Scanless::R->new({grammar => $grammar, +ranking_method => 'rule', semantics_package => 'main'}); eval { $recce->read(\$_ ) }; print(($@ ? (RED . "$@\n") : GREEN), $recce->show_progress(), "\n", Dumper($recce->value), "\n\n", RESET); } }

        This results in a parse result that makes it clear whether the hostaddr4 component is an ipv4 or a hostname, by the first element that gets pushed onto the array, but requires that I later recompose both IPv4 addresses and hostnames, e.g. by shift; join '.', @_:

        \[ "entry", "foo", [ "hostaddr4", [ "ipv4", 192, 0, 2, 1, ], ], ] ... \[ "entry", "foo", [ "hostaddr4", [ "hostname", "www", "example", "org", ], ], ] ...

        If I update the grammar slightly, to use a custom action to recompose these for me:

        <ipv4> ::= NUMBER ('.') NUMBER ('.') NUMBER ('.') NUMBER + action => joindot <hostname> ::= NAMECH+ separator => DOT + action => joindot ... sub joindot { shift, join '.', @_ }

        ...the resulting parse structure no longer has the ipv4 or hostname indicators, and there appears not to be enough information in the arguments passed to joindot for it to return it:

        \[ "entry", "foo", [ "hostaddr4", "192.0.2.1", ], ] ... \[ "entry", "foo", [ "hostaddr4", "www.example.org", ], ]

        The only approach I can see to do so is to define the grammar with two near-identical function, one that emits 'ipv4' and one that emits 'hostname':

        <ipv4> ::= NUMBER ('.') NUMBER ('.') NUMBER ('.') NUMBER + action => joinipv4 <hostname> ::= NAMECH+ separator => DOT + action => joinhostname ... sub joindot { join '.', @_ } sub joinipv4 { shift, ["ipv4", (joindot @_)] } sub joinhostname { shift, ["hostname", (joindot @_)] }

        But, in a larger grammar, that repetition is a pain, especially when the information is already known to the parser (and can be emitted automatically where the action is non-custom).

        Am I missing a trick here?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11127216]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2024-04-23 11:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found