I might not be explaining myself properly...
In the contrived below example, the default action is [name,values], as you recommend:
#!/usr/bin/env perl
use warnings;
use strict;
use Data::Dumper::Concise;
use Term::ANSIColor qw(:constants);
use Marpa::R2;
package main;
my $rules = <<'END_OF_GRAMMAR';
lexeme default = latm => 1
:default ::= action => [name,values]
:start ::= <entry>
<entry> ::= 'foo' (SP) <hostaddr4>
| 'bar' (SP) <hostaddr4>
| 'baz' (SP) <hostaddr4>
<ipv4> ::= NUMBER ('.') NUMBER ('.') NUMBER ('.') NUMBER
<hostname> ::= NAMECH+ separator => DOT
<hostaddr4> ::= <ipv4> | <hostname>
SP ~ [\s]+
DOT ~ '.'
NAMECH ~ [^\s.:]+
NUMBER ~ [\d]+
END_OF_GRAMMAR
my $input = <<'END_OF_INPUT';
foo 192.0.2.1
foo www.example.org
bar 192.0.2.2
bar 3.2.0.192.in-arpa.net
baz 192.0.2
baz flibble.example.com
END_OF_INPUT
my $grammar = Marpa::R2::Scanless::G->new({source => \$rules});
for (split /^/m, $input) {
chomp;
if (length $_) {
print "\n\n$_\n";
my $recce = Marpa::R2::Scanless::R->new({grammar => $grammar,
+ranking_method => 'rule', semantics_package => 'main'});
eval { $recce->read(\$_ ) };
print(($@ ? (RED . "$@\n") : GREEN),
$recce->show_progress(), "\n",
Dumper($recce->value), "\n\n", RESET);
}
}
This results in a parse result that makes it clear whether the hostaddr4 component is an ipv4 or a hostname, by the first element that gets pushed onto the array, but requires that I later recompose both IPv4 addresses and hostnames, e.g. by shift; join '.', @_:
\[
"entry",
"foo",
[
"hostaddr4",
[
"ipv4",
192,
0,
2,
1,
],
],
]
...
\[
"entry",
"foo",
[
"hostaddr4",
[
"hostname",
"www",
"example",
"org",
],
],
]
...
If I update the grammar slightly, to use a custom action to recompose these for me:
<ipv4> ::= NUMBER ('.') NUMBER ('.') NUMBER ('.') NUMBER
+ action => joindot
<hostname> ::= NAMECH+ separator => DOT
+ action => joindot
...
sub joindot { shift, join '.', @_ }
...the resulting parse structure no longer has the ipv4 or hostname indicators, and there appears not to be enough information in the arguments passed to joindot for it to return it:
\[
"entry",
"foo",
[
"hostaddr4",
"192.0.2.1",
],
]
...
\[
"entry",
"foo",
[
"hostaddr4",
"www.example.org",
],
]
The only approach I can see to do so is to define the grammar with two near-identical function, one that emits 'ipv4' and one that emits 'hostname':
<ipv4> ::= NUMBER ('.') NUMBER ('.') NUMBER ('.') NUMBER
+ action => joinipv4
<hostname> ::= NAMECH+ separator => DOT
+ action => joinhostname
...
sub joindot { join '.', @_ }
sub joinipv4 { shift, ["ipv4", (joindot @_)] }
sub joinhostname { shift, ["hostname", (joindot @_)] }
But, in a larger grammar, that repetition is a pain, especially when the information is already known to the parser (and can be emitted automatically where the action is non-custom).
Am I missing a trick here?