Parsing HereDocs

JStrom has asked for the wisdom of the Perl Monks concerning the following question:

I have a language that supports a heredoc syntax similar to Perl's:

var = [[END . "other stuff";
heredoc
data
END
[download]

I'm trying to find a way to parse this without rolling my own tokenizer, but I'm running into problems with the standard tools (The newline changes its meaning and the expression between the heredoc content and the heredoc term). Has someone tackled this problem before?

Edit -- found a solution using Eyapp. The language I'm using doesn't have a << operator so munging the lexer works:

sub _Lexer {

    for( $input ) {
        if( @heredoc ) {
            /\A(.*?)\n$heredoc[0][0]/s or die "Unterminated heredoc";
            $strings[ $heredoc[0][1] ] = $1;
            shift @heredoc;
        }

        s/^\s*//;

        return ($1,$1) if s/^([;.])//;
        return ('IDENT',$1) if s/^(\w+)//;

        if( s/^<<(\w+)// ) {
            push @heredoc, [ $1, $id ];
            return ( 'HEREDOC', $id++ );
        }
    }

    return ('',undef);
}
[download]

(there should be a flag in the white space eater in the above code that switches on the heredoc parsing. upload the correct code later)

Comment on Parsing HereDocs Select or Download Code

Replies are listed 'Best First'.
Re: Parsing HereDocs by GrandFather (Saint) on Jun 11, 2008 at 00:32 UTC
What are you using to parse the source at present? Are you looking for a chunk of code using regexen and loops, or something you can plug into a Parse::RecDescent rule set? The following sketch code for handling the problem using regexen may help: `use strict; use warnings; while (<DATA>) { s/\[\[(\w+)/parseHereDoc ("$1")/e if /\[\[\w+/; print; } sub parseHereDoc { my $id = shift; my $str = '"'; while (<DATA>) { last if /^$id$/; $str .= $_; } return $str . '"'; } __DATA__ var = [[END . "other stuff"; heredoc data END` [download] Prints: `var = "heredoc data " . "other stuff";` [download] Perl is environmentally friendly - it saves trees	[reply] [d/l] [select]
Re^2: Parsing HereDocs by JStrom (Pilgrim) on Jun 11, 2008 at 01:47 UTC
I've been working with Parse::RecDescent, but I have no problem switching to YAPP or one of the others as I've invested little time in the parsing code so far.	[reply]
Re: Parsing HereDocs by jethro (Monsignor) on Jun 11, 2008 at 00:24 UTC
Hopefully others understand better what you mean by standard tools. Shouldn't it be possible to use a different (simple) parser für the heredoc? Parse::RecDescent for example has no problem switching between different parsers.	[reply]