comment on

If you want to avoid PRD, there's a few things you can do:

Write an event parser for your language:
- pass events to an event handler object
- the two tokens 'page foo' generate an event new_type("page","foo")which creates a new elem as a child of the element at the top of a stack
- The token { puts the last child of the top of the stack on the top of the stack
- the } token pops an element from the stack
- anything that is not recognized as a "new thing" structure ((\w+)\s+(?:(\w+)\s+)?\{) is globbed up, and passed to the 'character_data' event, in your case, probably one per line
- the event handler has a 'root' element predefined, at the top of the stack
use something like the event parser to convert the language with no state into XML or YAML or whatever, and use a parser for that
use ??{ } in regexes in a similar manner to the event parser handler. If you're going that way, you can nest expressions using ??{ }. See perlre for some devious tricks you can do with this construct. /msg me if you would like me to post an example.

Update: it's done. it was fun, but don't use it. Someone below implemented the event parser I was talking about, just not in a decoupled OO kind of way.

use strict;
use warnings;

use re 'eval';

my $str = <<FOO;
page p1 {
        question 4B {
            label {
                Do you like your pie with ice cream?
            }

            single {
                1 Yes
                2 No
            }
        }

        question 4C {
            label {
                Do you like your pie with whipped cream?
            }

            single {
                1 Yes
                2 No
            }
        }
    }
FOO

my $string = qr/
    ^ (?> \s* (.+) ) \s* $
    (?{ add_string($^N) })
/xm;

my $tokens;
my ($type, $name);
my $block = qr/
    # capture a type
    (?: (\w+) \s+ )    (?{ $type = $^N })
    
    (
        # capture an optional name, set $name to that
        (?{ $name = undef }) # first unset $name, in case this doesn't
+ match
        ((?: (\w+) \s+ )(?{ $name = $^N }) )?
    )
    
    \{ # if this starts to look like an element, push a new cell on th
+e stack
    (?{ new_elem($type, $name) })

    (
        (
            # this subpattern tries to capture a complete body, with t
+he closing brace

            (??{ $tokens })
            \}
            (?{ close_elem() }) # if we got here it means we have a fu
+ll body, with tokens and a closing brace
            
        ) | (
             # if we got here, then the body subpattern failed, and we
+ must abort
            (?{ abort_elem() })
            (?!) # this match always fails because it negates a match 
+on anything, that always succeeds
        )
    )
/xs;

my $blocks = qr/($block \s*)+/xs;
my $strings = qr/($string \s*)+?/xs;
$tokens = qr/\s* ( $blocks | $strings ) \s*/xs; # tokens is either som
+e strings, or some blocks

my $doc = qr/^$tokens$/s;

my @stack;
new_elem("doc" => "root"); # create the root element
$str =~ $doc;

use Data::Dumper;
warn Dumper(@stack); # should contain just the root element


sub new_elem {
    my $elem = {
        type => $_[0],
        (defined($_[1]) ? (name => $_[1]) : ()),
        contains => [],
    };

    if (@stack){ push @{ $stack[-1]{contains} }, $elem }
    push @stack, $elem;
}

sub abort_elem {
    pop @stack;
    pop @{ $stack[-1]{contains} };
}

sub close_elem { pop @stack }

sub add_string { push @{ $stack[-1]{contains} }, $_[0] }
[download]

-nuffin
zz zZ Z Z #!perl

In reply to Re: Parsing a macro language by nothingmuch
in thread Parsing a macro language by bluetrust

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.