Raw Text and Parse::RecDescent

hsmyers has asked for the wisdom of the Perl Monks concerning the following question:

The code below works as given, but is seriously flawed. There are (at least) two things wrong:

Each 'script_line' requires leading white space.
The closing '}' must not have leading white space.

The root cause of this dis-harmony is the definition of script_line. The /\s.*/ essentially eats any line that begins with white space. But since we are not talking Python here, this is too arbitrary to easily choke down. Likewise the first column requirement for the closing brace. There is a small herd of things that I've tried:

/.*/
/\b.*/
/[~\}]*/

plus numerous incremental versions of this and similar ideas. Loosely put, given balanced braces give me each line of text within regardless of content. Seems simple but I suspect that like many such situations that the semblance is false!

So then-- the question I put to you my brethren is this: How do I re-write the grammar so that I can parse essentially raw text while still allowing for capacious code layout (i.e. the closing brace must be allowed to wander as it will)?

#!/perl/bin/perl
#
# test.pl --
use strict;
use warnings;
use diagnostics;
use Parse::RecDescent;
use Data::Dumper;

$::RD_TRACE = 1 if ( $ARGV[0] );
$::RD_AUTOACTION = q { [@item[0..$#item]] };

my $parser = new Parse::RecDescent( q{
startrule:              
    script
script:
    'script' '[' script_options(s /,/) ']' script_body
script_options:
    script_option '=' qstring
script_option:
    'tag-prefix' |
    'language' |
    'implements-prefix' 
script_body:
    '{' script_line(s?) '}'
script_line:
    <skip: ''> newline(s?) /\s.*/ newline
newline:
    "\n"
is_printable:           
    /[a-zA-Z0-9_&\#;:\$()\'= ,!\@\/.\[\]\-]+/
qstring:                
    '"' is_printable '"'
}
);

#_____________________________________________________________________
+_________

my $data;
my $test;

while (<DATA>) {
    $test .= $_;
}

$data = $parser->startrule($test);
print Dumper $data,"\n";

__DATA__
script [tag-prefix="msxsl" ,language="VBScript", implements-prefix="us
+er"] {

    dim fs
    dim FoldersOnly
    dim FilesOnly
    dim FilesAndFolders
    
    function getFolder(dirName,Mode)
        dim folder
        init
        set fso = CreateObject("Scripting.FileSystemObject")
        set folder = fso.GetFolder(dirName)
        set folderDoc=createObject("MSXML2.FreeThreadedDOMDocument")
        folderDoc.setProperty "SelectionLanguage","XPath"
        folderDoc.async=false
        folderDoc.loadXML "<folder name='"+dirName+"'/>"
        set folderNode=folderDoc.documentElement
        folderNode.setAttribute "name",folder.name
        folderNode.setAttribute "path",folder.path
        folderNode.setAttribute "dateCreated",mapDate(folder.dateCreat
+ed)
        folderNode.setAttribute "dateLastAccessed",mapDate(folder.date
+LastAccessed)
        folderNode.setAttribute "shortName",folder.shortName
        folderNode.setAttribute "shortPath",folder.shortPath
        folderNode.setAttribute "size",folder.size
        folderNode.setAttribute "type",folder.type
        if mode=FoldersOnly or mode=FilesAndFolders then
            getFolders folder,folderNode,dirName,mode
        end if
        if mode=FilesOnly or mode=FilesAndFolders then
            getFiles folder,folderNode,dirName
        end if
        set getFolder=folderDoc
    end function

}
[download]

–hsm

"Never try to teach a pig to sing…it wastes your time and it annoys the pig."

edited: Fri Jun 28 23:05:16 2002 by jeffa - added readmore tag

Comment on Raw Text and Parse::RecDescent Select or Download Code

Replies are listed 'Best First'.
Re: Raw Text and Parse::RecDescent by kvale (Monsignor) on Jun 28, 2002 at 17:31 UTC
It seems to me that what you want to do is differentiate between lines that contain a closing brace and those that do not; whitespace should be irrelevant. In that case, how about something like this modification: `script_body: '{' script_line(s?) /\s/s '}' script_line: <skip: ''> /\s/s /[^}]*/` [download] The idea is to only skip lines that have no closing brace within them. If the line does have a closing brace, `script_line` will fail, and after eating whitesapce, the closing brace will match. (Apologies if this isn't correct syntax; I have never used Parse::RecDescent before.) If script lines can contain closing braces, then you will need to do a bit more parsing (matching braces, etc.) -Mark	[reply] [d/l] [select]
Re: Raw Text and Parse::RecDescent by hsmyers (Canon) on Jun 28, 2002 at 21:11 UTC
To answer my own question-- Read the fine pod brother! Within minutes of posting, the usual Myers-Murphy Law of Public Posting₁ did it's usual magic. The pod in part reads: “ The "<perl_codeblock>" directive can be used to parse curly-brace-delimited block of Perl code, such as: `{ $a = 1; f() =~ m/pat/; }.` It does this by calling `Text::Balanced::extract_codeblock()`. If the directive matches text representing a valid Perl code block, it returns that text. Otherwise it fails with the usual "undef" value.” All of which leads to an improved shorter solution: `my $parser = new Parse::RecDescent( q{ startrule: script script: 'script' '[' script_options(s /,/) ']' codeblock codeblock: <perl_codeblock> script_options: script_option '=' qstring script_option: 'tag-prefix' \| 'language' \| 'implements-prefix' is_printable: /[a-zA-Z0-9_&\#;:\$()\'= ,!\@\/.\[\]\-]+/ qstring: '"' is_printable '"' } );` [download] (1) Embarrassment is directly proportional to how obvious the answer suddenly becomes. –hsm "Never try to teach a pig to sing…it wastes your time and it annoys the pig."	[reply] [d/l] [select]