Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Is the skip: directive broken in Parse::RecDescent ? [Solved - PEBKAC]

by Hercynium (Hermit)
on Aug 11, 2008 at 21:21 UTC ( [id://703751]=perlquestion: print w/replies, xml ) Need Help??

Hercynium has asked for the wisdom of the Perl Monks concerning the following question:

So, I've been happily learning how to use grammars for parsing with Parse::RecDescent, and I've been very pleased with it's power and flexibility so far... but I'm stumbling over a problem that for the life of me, I can't understand why it's happening!

I highly doubt that this could be a bug in PRD - it's used by too many people... but even the most bare code is demonstrating this frustrating problem:

Basically, it's this: Changing the prefix pattern has NO effect!

If I print out $skip it shows that it is set as expected, but the behavior of PRD does not change from the default.

This happens whether I am using a skip: directive, setting $skip from within an Action, or setting $Parse::RecDescent::skip from outside the grammar code.

Here's a little demonstration of what I'm getting...

Code like this:
#!/usr/bin/env perl use strict; use warnings; use Parse::RecDescent; use Data::Dumper; my $grammar = <<'END_GRAMMAR'; file: <skip:qr/#\w+/> 'foo' 'bar' { [ @item ] } END_GRAMMAR my $data = <<'END_DATA'; #skdjslkdjsakdjadjlksa foobar END_DATA our $parse = new Parse::RecDescent($grammar) || die "Couldn't generate parser from grammar: $!"; my $parse_tree = $parse->file($data); print Dumper $parse_tree;

Outputs this:
$VAR1 = undef;


I'm pretty certain it's not a problem with the regexes I'm using because when I do something like this instead:
#!/usr/bin/env perl use strict; use warnings; use Parse::RecDescent; use Data::Dumper; my $grammar = <<'END_GRAMMAR'; file: /#\w+/ 'foo' 'bar' { [ @item ] } END_GRAMMAR my $data = <<'END_DATA'; #skdjslkdjsakdjadjlksa foobar END_DATA our $parse = new Parse::RecDescent($grammar) || die "Couldn't generate parser from grammar: $!"; my $parse_tree = $parse->file($data); print Dumper $parse_tree;

I get this output:
$VAR1 = [ 'file', '#skdjslkdjsakdjadjlksa', 'foo', 'bar' ];

I've scoured Google, PM, the Docs, the FAQ, and RT for more info about this, but it looks like I've the only soul to have this problem... Is there any advice on how to track down the source of this conundrum?

Update:

As I suspected, the "skip" or "terminal prefix" functionality is *not* broken... but it is not quite as DWIMmy as I was expecting with regards to how the regular expression specified is used.

I still don't think I understand the subtle details, but as far as I can tell, one should keep in mind that the skip regex (aka terminal prefix), is matched ONLY ONCE. Therefore, one probably should surround the whole thing with a parenthesis and asterisk to ensure *everything* one wants to skip will be consumed in *one pass*

To further show what I mean, here is one of the many non-working regexes that brought me here:
/(?: \# .*? \n? | \s* )?/msx
It will match only ONE INSTANCE of a comment or repeated whitespace. My example text has several adjoining instances of comments and whitespace, and only the first match was being consumed!

Here is the regex that does what I want:
/(?: \# .*? \n | \s )*/msx
As you can see, it consumes ALL Comments AND whitespace until nothing matches. SMALL change, BIG difference!

I now have this working the way I want, by assigning it to $skip in the "start-up actions":
$skip = '(?msx: \# .*? \n | \s )*'

This has been another fun and edifying expedition, and if anyone reading this has any additional questions, I am happy to share whatever meager knowledge I have gained :)

Replies are listed 'Best First'.
Re: Is the skip: directive broken in Parse::RecDescent ?
by ikegami (Patriarch) on Aug 11, 2008 at 21:44 UTC

    <skip:qr/#\w+/> 'foo' 'bar'
    is equivalent to
    /(?>#\w+)(?>foo)(?>#\w+)(?>bar)/

    /#\w+/ isn't matched by the "#skdjslkdjsakdjadjlksa\n" before "foo" (note the newline).
    /#\w+/ isn't matched by the "" before "bar".

    I think you want <skip:qr/(?:#\w+\n)?/>

    By the way,
    /#\w+/ 'foo' 'bar'
    is equivalent to
    /(?>\s*)(?>#\w+)(?>\s*)(?>foo)(?>\s*)(?>bar)/
    since the default skip is /\s*/.

    Update: Added "by the way" bit.

      Thanks again, ikegami. You've given me another avenue to search for my solution. I'm curious though - the trace shows that the skip block is being treated like a production... and it's returning a value! [\s*]

      Do all modified prefix matches return this, or just skip directives?
        Quote the docs,

        The <skip> directive evaluates to the previous terminal prefix, so it's easy to reinstate a prefix later in a production

        (Followed by an example)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://703751]
Approved by toolic
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2024-03-29 02:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found