Re: ParseRecDescent and csv-like data

Although I agree that it might be better to consider 'apple_pear' as one token rather than two (as suggested by Elian and dreadpiratepeter above, the code below should do what you want:

#!/usr/bin/perl

use strict;
use warnings;

use Parse::RecDescent;

my $text = <<EOI
apple_pear
cherry
munchy_nice_apricot
mint_
_banana
raspberry__pie
EOI

$Parse::RecDescent::skip = '[ \t]*';

my $grammar = q(

  { use Data::Dumper }

  data: line(s)
  line: term endofline
  term: word(s /_/) ...endofline
       { print Dumper(\%item); }
  word: /[a-z]+/
  endofline: /[\n\r]+/

);

my $parser = Parse::RecDescent->new($grammar);
if ($parser->data($text)) {
    print "ok\n";
}
[download]

Hope this helps, -gjb-

Update: changed the code to suite rkg's requirement that '_apple', 'apple__juice' and 'apple_' should not be accepted. The grammar happens to be more elegant now and I learned about separator patterns.

Update 2: removed an unused token from the grammar.

Comment on Re: ParseRecDescent and csv-like data Download Code

Replies are listed 'Best First'.
Re: Re: ParseRecDescent and csv-like data by rkg (Hermit) on Aug 21, 2003 at 16:05 UTC
Thanks for your help. You code, I think, accepts apple__pear (two underscores), _apple (leading underscore), and pear_ (trailing) as valid... in my desired grammar, they shouldn't be. Any thoughts? rkg ps And yes, I like your approach of treating a term as multiple tokens, vs. one. This is a part of a larger grammar, and I need the flexibility of ParseRecDescent.	[reply]