Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Cannot get Marpa::R2 to prioritise one rule over another

by Discipulus (Canon)
on Jan 21, 2021 at 12:32 UTC ( [id://11127205]=note: print w/replies, xml ) Need Help??


in reply to Cannot get Marpa::R2 to prioritise one rule over another

Hello again,

maybe a second attempt is better than first one. I had to specify what an hostname is in an ugly way but seems viable.

I'm going mad to understand why the dot . is passed in for IPs and not for hostnames! (because ip ends with an action?)

#!/usr/bin/env perl use warnings; use strict; use Data::Dump; use Marpa::R2; my $rules = <<'END_OF_GRAMMAR'; lexeme default = latm => 1 :default ::= action => ::first entry ::= op hostaddr4 action => dump_entry op ::= 'add' action => add_op | 'remove' action => add_op hostaddr4 ::= hostname | ipv4 hostname ::= DOMAIN EXT action => add_hostname | DOMAIN DOMAIN EXT action => add_hostname | DOMAIN DOMAIN DOMAIN EXT action => add_hostname DOMAIN ::= NAME '.' NAME ~ [\d\w]+ EXT ~ 'org' | 'net' ipv4 ::= NUMBER '.' NUMBER '.' NUMBER '.' NUMBER action => +add_ip NUMBER ~ [\d]+ :discard ~ SP SP ~ [\s]+ END_OF_GRAMMAR my $input = <<'END_OF_INPUT'; add example.org add www.perl.org add 42.perl.net add 192.0.2.1 remove 192.0.2.2 END_OF_INPUT my $grammar = Marpa::R2::Scanless::G->new({source => \$rules}); for (split /^/m, $input) { chomp; if (length $_) { print "\nPARSING: $_\n"; my $recce = Marpa::R2::Scanless::R->new({ grammar => $grammar, }); my $value_ref = $grammar->parse( \$_, 'main'); } } sub dump_entry{ print "dump_entry received: "; dd shift @_; } sub add_op{ my $self = shift @_; print "add_op received: "; dd @_; $$self{operator} = join '',@_; return $self; } sub add_ip{ my $self = shift @_; print "add_ip received: "; dd @_; $$self{type} = 'IP'; $$self{value} = join '',@_; return $self; } sub add_hostname{ my $self = shift @_; print "add_hostname received: "; dd @_; $$self{type} = 'hostname'; $$self{value} = join '.',@_; return $self; } __DATA__ PARSING: add example.org add_op received: "add" add_hostname received: ("example", "org") dump_entry received: { operator => "add", type => "hostname", value => + "example.org" } PARSING: add www.perl.org add_op received: "add" add_hostname received: ("www", "perl", "org") dump_entry received: { operator => "add", type => "hostname", value => + "www.perl.org" } PARSING: add 42.perl.net add_op received: "add" add_hostname received: (42, "perl", "net") dump_entry received: { operator => "add", type => "hostname", value => + "42.perl.net" } PARSING: add 192.0.2.1 add_op received: "add" add_ip received: (192, ".", 0, ".", 2, ".", 1) dump_entry received: { operator => "add", type => "IP", value => "192. +0.2.1" } PARSING: remove 192.0.2.2 add_op received: "remove" add_ip received: (192, ".", 0, ".", 2, ".", 2) dump_entry received: { operator => "remove", type => "IP", value => "1 +92.0.2.2" }

L*

There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

Replies are listed 'Best First'.
Re^2: Cannot get Marpa::R2 to prioritise one rule over another
by choroba (Cardinal) on Jan 21, 2021 at 17:16 UTC
    The dot is ignored because the default action (::first) is used for the DOMAIN rule. Mixing the lexer and grammar rules is not a good idea, they're very different. Using consistent capitalization for the non-terminals also helps, I usually use a different rule for the grammar and lexer ones.

    I usually build the grammar from the top to the bottom, i.e. from the starting symbol to the L0 rules. I start with the default action of [name,values] and replace it with individual actions from the bottom to the top.

    The result might be something like

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      Hello choroba,

      can you be so kind to explain me better your:

      > Mixing the lexer and grammar rules is not a good idea, they're very different.

      because I'm reading Marpa-R2 vocabulary and I am not able to strictly define them. Where my code mixes them?

      L*

      There are no rules, there are no thumbs..
      Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
        Lines 24 and 25 contain lexer rules (easily recognisable by the tilde), but line 27 contains a grammar rule again, followed by another lexer rule. But maybe that's how you like it. The more important question is whether you know what the difference between them is; separating them visually helped me to keep them separated in my head, too.

        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

      Thanks for demonstrating how to recompose the dotted components of hostnames and IPs, using a custom action. I had been wondering how best to go about that, and you have given me a starting point.

      One question, regarding your concat subroutine, if I may: Is it possible to generalise it to return the [rulename,concatted-string] pair, so it conforms to the tokens emitted by the default action [name,values], or would I have to have a separate subroutine for each rule (and return the rulename literally)?

      I had originally thought there might be context in first argument, which you shift over, but that appears to be an empty hashref in all cases I've seen.

        The first argument is there for you, you can store whatever you want in it. But if you can build the result just by composition, I don't see a reason to use it.

        AFAIK, there aren't many predefined actions (::first, [name,values]). Concatenation is definitely not a universal thing, you typically propagate structures, not strings.

        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re^2: Cannot get Marpa::R2 to prioritise one rule over another
by Anonymous Monk on Jan 21, 2021 at 20:55 UTC

    Thanks for this attempt, but I'm not sure that defining hostname as a fixed number of DOMAIN components, nor defining a limited set of EXT suffixes is the right way to go. Hostnames can be arbitrarily long, at least in terms of subdomains, and the list of top-level domains is growing by the day.

    I'm probably going to settle just capturing NAME and laying off the semantics of IPv4, (later) IPv6, and neither of those to a custom action. Given the complexity of the problem (esp. IPv6), that is likely the best way forward.

    J.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11127205]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (1)
As of 2024-04-24 13:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found