comment on

I have a Marpa::R2 parser that is attempting to differentiate between IP addresses and hostnames without a difference in leading keywords. The actual grammar I am using is complicated enough not to try and replicate it here, but a minimally-reproducable example of the same problem is below:

#!/usr/bin/env perl
use warnings;
use strict;

use Data::Dumper;
use Term::ANSIColor qw(:constants);
use Marpa::R2;

my $rules = <<'END_OF_GRAMMAR';
    lexeme default  = latm => 1
    :default        ::= action => [name,values]
    :start          ::= <entry>

    <entry>         ::= <op> (SP) <hostaddr4>
    <op>            ::= 'add' | 'remove'
    
    <ipv4>          ::= NUMBER ('.') NUMBER ('.') NUMBER ('.') NUMBER
    <hostname>      ::= NAME
    
    <hostaddr4>     ::= <ipv4> | <hostname>
    
    SP              ~ [\s]+    
    NAME            ~ [\S]+
    NUMBER          ~ [\d]+
END_OF_GRAMMAR

my $input = <<'END_OF_INPUT';
add 192.0.2.1
add www.example.org
remove 192.0.2.2
END_OF_INPUT

my $grammar = Marpa::R2::Scanless::G->new({source => \$rules});

for (split /^/m, $input) {
    chomp;
    if (length $_) {
        print "\n\n$_\n";
        
        my $recce = Marpa::R2::Scanless::R->new({
            grammar => $grammar, 
            ranking_method => 'rule'
        });
        
        eval { $recce->read(\$_ ) };
        print ($@ ? (RED . "$@\n") : GREEN);
        print $recce->show_progress(), "\n";
        print Dumper($recce->value), "\n\n", RESET;
    }
}
[download]

From what I can tell, Marpa always picks the <hostname> form of the grammar, even on lines that look more like IPs. I assume this is because the character class [\S]+ also includes the characters which make up an IP address.

So far, in my grammar definition, I've tried:

<hostaddr4>     ::= <ipv4> | <hostname>

<hostaddr4>     ::= <ipv4> || <hostname>

<hostaddr4>     ::= <hostname> | <ipv4>

<hostaddr4>     ::= <hostname> || <ipv4>

<hostaddr4>     ::= <ipv4>      rank => 2
                  | <hostname>  rank => 1


<hostaddr4>     ::= <ipv4>      rank => 1
                  | <hostname>  rank => 2


<hostaddr4>     ::= <ipv4>      rank => 1
<hostaddr4>     ::= <hostname>  rank => 2

<hostaddr4>     ::= <hostname>  rank => 1
<hostaddr4>     ::= <ipv4>      rank => 2
[download]

...and none seem to make a difference. They all yield the ['hostname', '192.0.2.1'] array.

The only thing that does it is removing the <hostname> alternate from <hostaddr4> (which does not match the grammar of the data I am parsing), and then the representation changes to ['ipv4', '192', '0', '2', '1']

Can anyone advise the correct approach in this (seemingly) simple case?

In reply to Cannot get Marpa::R2 to prioritise one rule over another by Anonymous Monk

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


good chemistry is complicated, and a little bit messy -LW
	PerlMonks