rkg has asked for the wisdom of the Perl Monks concerning the following question:

Hi. I am trying to get up to speed on ParseRecDescent. The FAQ offers idiom (credited to Merlyn) to match CSV-like data
CSVLine: QuotedText(s Comma) { do something}
I'm trying to write a rule to match a term. A term is one or more words, separated by underscores. Neither terms nor words can contain whitespace. These are valid terms: apple, apple_pear, apple_plum_grape. These are not valid terms: _, _apple, apple_, apple__pear. Can anyone give me guidance on writing the rule for term and word? Do I need a <skip> directive to indicate a term cannot contain whitespace? Here's what I have, which does not work. Many thanks!
my $g = q( Word: /^[A-Z]+$/i {return $item[1]} TermSep: '_' Term: Word(s TermSep) { use Data::Dumper; Dumper($item[1]) } Line: Term {$item[1]} );
rkg

Replies are listed 'Best First'.
Re: ParseRecDescent and csv-like data
by Elian (Parson) on Aug 21, 2003 at 13:45 UTC
    You don't have to get that fancy. Something like:
    word: /[a-zA-Z]+(_[a-zA-Z]+)*/
    should be sufficient.
Re: ParseRecDescent and csv-like data
by dreadpiratepeter (Priest) on Aug 21, 2003 at 13:47 UTC
    Wouldn't:
    Term: /[A-Z]|[A-Z][A-Z_]*[A-Z]/i {return \split(/_/,$item[1])}
    Give you what you want? I could be wrong, it's off the top of my head, but it should return a term as a list of it's parts (which seems to be what you want).

    -pete
    "Worry is like a rocking chair. It gives you something to do, but it doesn't get you anywhere."
Re: ParseRecDescent and csv-like data
by Abigail-II (Bishop) on Aug 21, 2003 at 14:51 UTC
    Instead of using ParseRecDescent, may I suggest a Regexp::Common solution? Parsing lists is one of its options.
    #!/usr/bin/perl use strict; use warnings; use Regexp::Common; my $re = $RE{list}{-sep => '_'}{-pat => '(?:(?!_)\w)+'}; while (<DATA>) { chomp; print "'$_' ", /^$re$/ ? "matched\n" : "did not match\n"; } __DATA__ apple_pear apple_plum_grape _ apple _apple apple_ apple__pear 'apple_pear' matched 'apple_plum_grape' matched '_' did not match 'apple' did not match '_apple' did not match 'apple_' did not match 'apple__pear' did not match

    Abigail

Re: ParseRecDescent and csv-like data
by gjb (Vicar) on Aug 21, 2003 at 14:18 UTC

    Although I agree that it might be better to consider 'apple_pear' as one token rather than two (as suggested by Elian and dreadpiratepeter above, the code below should do what you want:

    #!/usr/bin/perl use strict; use warnings; use Parse::RecDescent; my $text = <<EOI apple_pear cherry munchy_nice_apricot mint_ _banana raspberry__pie EOI $Parse::RecDescent::skip = '[ \t]*'; my $grammar = q( { use Data::Dumper } data: line(s) line: term endofline term: word(s /_/) ...endofline { print Dumper(\%item); } word: /[a-z]+/ endofline: /[\n\r]+/ ); my $parser = Parse::RecDescent->new($grammar); if ($parser->data($text)) { print "ok\n"; }

    Hope this helps, -gjb-

    Update: changed the code to suite rkg's requirement that '_apple', 'apple__juice' and 'apple_' should not be accepted. The grammar happens to be more elegant now and I learned about separator patterns.

    Update 2: removed an unused token from the grammar.

      Thanks for your help. You code, I think, accepts apple__pear (two underscores), _apple (leading underscore), and pear_ (trailing) as valid... in my desired grammar, they shouldn't be. Any thoughts? rkg ps And yes, I like your approach of treating a term as multiple tokens, vs. one. This is a part of a larger grammar, and I need the flexibility of ParseRecDescent.
Re: ParseRecDescent and csv-like data
by rkg (Hermit) on Aug 21, 2003 at 17:15 UTC
    Well, this ugly setup (below) does what I want.

    If anyone would like to help me clean it, I'd appreciate learning how!

    Word1: /^[A-Z0-9\/:"'.-]+$/i {$item[1]} Word2: /[A-Z0-9\/:"'.-]+/i {$item[1]} Term: Word1 | (<skip: ''> Word2 ('_' Word2)(s))
    (Why am I going to all this hassle, you may ask? Words are nested into terms, terms into superterms, and so on, each with a unique delimiter. So the simple regexp 1 token doesn't quite work. rkg