Hanken has asked for the wisdom of the Perl Monks concerning the following question:

Hi, guys, I am new to Perl and I have to use RecDescent module to parse a complex design log file. I got some problems of the parser's usage. Here is a simple example.
say I got a file like below:
======= START OF FILE ========== No Name Score Prize 1 Pig 100 red flower 2 Han 80 bread 3 Hen 50 ass kicked ======= END OF FILE ==========

I need to parse the data of every row and print them one by one. My perl is:
############### START OF PERL ################### #! /usr/local/bin/perl -sw BEGIN { close STDERR and open STDERR, '>./STDERR' or die $!; } use Parse::RecDescent; #============================================ # GRAMMAR DEFINITION HERE #============================================ $grammar = q{ Para: List(s) /\Z/ | { use Data::Dumper 'Dumper'; print "$_->[0]\n" for @{$thisparser->{errors}}; exit; } List: Order Name Score Prize Order: /\d+/ {print "@item\n";} | <error: 1> Name: /\w+/ {print "@item\n";} | <error: Expecting a name!> Score: /\d+/ {print "@item\n";} | <error: 2> Prize: /.*$/ {print "@item\n";} }; #============================================ # MAIN PROGRAM STARTS HERE #============================================ $parse = new Parse::RecDescent ($grammar); while (<DATA>) { chomp; $parse->Para($_); } __DATA__ 1 Pig 100 red flower 2 Han 80 bread 3 Hen 50 ass kicked ############### END OF PERL ###################

You know it works fine. But when the input data format changed to the following:
======= START OF FILE ========== No Name Score Prize 1 Pig 100 red flower 2 Han 80 bread 3 Hen 50 ass kicked ======= END OF FILE ==========

The perl won't work for it. I added <skip: qr/[\s\t\n]*/> to the List but it is fail to parse. How can I make it to deal some irregular text formats?

Replies are listed 'Best First'.
Re: RecDescent Parser problem: how to ignore new lines?
by ikegami (Patriarch) on Jun 05, 2008 at 08:51 UTC

    PRD does <skip:'\s*'> by default, and \s includes \n. Your problem is not actually in PRD or your grammar, but in what you feed to your parser.

    Change

    while (<DATA>) { chomp; $parse->Para($_); }

    to

    my $text = do { local $/; <DATA> }; $parse->Para($text);

    and change

    /.*$/

    to

    /[^\n]+/
      Hey, ikegami! That works and thanks for your informative hint!
      Actually, when I use 'chomp', I should have noticed that new line symbol (\n) has been chopped. :)