hsmyers has asked for the wisdom of the Perl Monks concerning the following question:

The code below works as given, but is seriously flawed. There are (at least) two things wrong:

The root cause of this dis-harmony is the definition of script_line. The /\s.*/ essentially eats any line that begins with white space. But since we are not talking Python here, this is too arbitrary to easily choke down. Likewise the first column requirement for the closing brace. There is a small herd of things that I've tried:

plus numerous incremental versions of this and similar ideas. Loosely put, given balanced braces give me each line of text within regardless of content. Seems simple but I suspect that like many such situations that the semblance is false!

So then-- the question I put to you my brethren is this: How do I re-write the grammar so that I can parse essentially raw text while still allowing for capacious code layout (i.e. the closing brace must be allowed to wander as it will)?

#!/perl/bin/perl # # test.pl -- use strict; use warnings; use diagnostics; use Parse::RecDescent; use Data::Dumper; $::RD_TRACE = 1 if ( $ARGV[0] ); $::RD_AUTOACTION = q { [@item[0..$#item]] }; my $parser = new Parse::RecDescent( q{ startrule: script script: 'script' '[' script_options(s /,/) ']' script_body script_options: script_option '=' qstring script_option: 'tag-prefix' | 'language' | 'implements-prefix' script_body: '{' script_line(s?) '}' script_line: <skip: ''> newline(s?) /\s.*/ newline newline: "\n" is_printable: /[a-zA-Z0-9_&\#;:\$()\'= ,!\@\/.\[\]\-]+/ qstring: '"' is_printable '"' } ); #_____________________________________________________________________ +_________ my $data; my $test; while (<DATA>) { $test .= $_; } $data = $parser->startrule($test); print Dumper $data,"\n"; __DATA__ script [tag-prefix="msxsl" ,language="VBScript", implements-prefix="us +er"] { dim fs dim FoldersOnly dim FilesOnly dim FilesAndFolders function getFolder(dirName,Mode) dim folder init set fso = CreateObject("Scripting.FileSystemObject") set folder = fso.GetFolder(dirName) set folderDoc=createObject("MSXML2.FreeThreadedDOMDocument") folderDoc.setProperty "SelectionLanguage","XPath" folderDoc.async=false folderDoc.loadXML "<folder name='"+dirName+"'/>" set folderNode=folderDoc.documentElement folderNode.setAttribute "name",folder.name folderNode.setAttribute "path",folder.path folderNode.setAttribute "dateCreated",mapDate(folder.dateCreat +ed) folderNode.setAttribute "dateLastAccessed",mapDate(folder.date +LastAccessed) folderNode.setAttribute "shortName",folder.shortName folderNode.setAttribute "shortPath",folder.shortPath folderNode.setAttribute "size",folder.size folderNode.setAttribute "type",folder.type if mode=FoldersOnly or mode=FilesAndFolders then getFolders folder,folderNode,dirName,mode end if if mode=FilesOnly or mode=FilesAndFolders then getFiles folder,folderNode,dirName end if set getFolder=folderDoc end function }

–hsm

"Never try to teach a pig to sing…it wastes your time and it annoys the pig."

edited: Fri Jun 28 23:05:16 2002 by jeffa - added readmore tag

Replies are listed 'Best First'.
Re: Raw Text and Parse::RecDescent
by kvale (Monsignor) on Jun 28, 2002 at 17:31 UTC
    It seems to me that what you want to do is differentiate between lines that contain a closing brace and those that do not; whitespace should be irrelevant. In that case, how about something like this modification:
    script_body: '{' script_line(s?) /\s*/s '}' script_line: <skip: ''> /\s*/s /[^}]*/
    The idea is to only skip lines that have no closing brace within them. If the line does have a closing brace,  script_line will fail, and after eating whitesapce, the closing brace will match. (Apologies if this isn't correct syntax; I have never used Parse::RecDescent before.)

    If script lines can contain closing braces, then you will need to do a bit more parsing (matching braces, etc.)

    -Mark

Re: Raw Text and Parse::RecDescent
by hsmyers (Canon) on Jun 28, 2002 at 21:11 UTC

    To answer my own question-- Read the fine pod brother! Within minutes of posting, the usual Myers-Murphy Law of Public Posting1 did it's usual magic. The pod in part reads:

    “ The "<perl_codeblock>" directive can be used to parse
    curly-brace-delimited block of Perl code, such as: { $a = 1; f() =~ m/pat/; }.
    It does this by calling Text::Balanced::extract_codeblock().

    If the directive matches text representing a valid Perl code block,
    it returns that text. Otherwise it fails with the usual "undef"
    value.”

    All of which leads to an improved shorter solution:

    my $parser = new Parse::RecDescent( q{ startrule: script script: 'script' '[' script_options(s /,/) ']' codeblock codeblock: <perl_codeblock> script_options: script_option '=' qstring script_option: 'tag-prefix' | 'language' | 'implements-prefix' is_printable: /[a-zA-Z0-9_&\#;:\$()\'= ,!\@\/.\[\]\-]+/ qstring: '"' is_printable '"' } );
    (1) Embarrassment is directly proportional to how obvious the answer suddenly becomes.

    –hsm

    "Never try to teach a pig to sing…it wastes your time and it annoys the pig."