As a follow-up, here's a toy compiler that handles balanced parentheses, and demonstrates the basic concepts of lexing input into tokens, parsing the tokens into a syntax tree, and walking the syntax tree to produce output. The lexer uses regular expressions, the parser uses a while() loop and a stack, and the output generator uses recursive functions. I also threw in the ability to handle literals, to make the problem a little more like the original query.

-- edit: moved the code below the fold.

$str = '(one(two(three(four)))(()()))(()(()()))'; ## lexer -- read the input and create a token stream sub lll { return (['L','']); } sub rrr { return (['R','']); } sub ttt { return (['T',$_[0]]); } %LLUT = ( '(' => \&lll, ')' => \&rrr, 'ttt' => \&ttt, ); @tokens = (); for $i ($str =~ /(\(|\)|[^()]+)/g) { my $func = $LLUT{ $i } || $LLUT{'ttt'}; push @tokens, &$func ($i); } ## parser -- read the token stream, and produce a syntax tree $stack = [[]]; sub start_group { my $stack = shift; my $list = ['group']; push @{ $stack->[-1] }, $list; push @$stack, $list; return; } sub end_group { my $stack = shift; pop @$stack; } sub literal { my $stack = shift; my $token = shift; push @{ $stack->[-1] }, ['literal',$token]; } %PLUT = ( 'L' => \&start_group, 'R' => \&end_group, 'T' => \&literal, ); for $t (@tokens) { $PLUT{ $t->[0] }($stack, $t->[1]); } ## walk the syntax tree and produce output sub level_1 { my $list = shift; my $type = shift @$list; if ('literal' eq $type) { print @$list; } else { print "("; for $i (@$list) { &level_2 ($i); } print ")"; } return; } sub level_2 { my $list = shift; my $type = shift @$list; if ('literal' eq $type) { print @$list; } else { print "["; for $i (@$list) { &deeper ($i); } print "]"; } return; } sub deeper { my $list = shift; my $type = shift @$list; if ('literal' eq $type) { print @$list; } else { print "("; for $i (@$list) { &deeper ($i); } print ")"; } return; } for $i (@{ $stack->[0] }) { &level_1 ($i); }

In reply to Re: Re: Re: parsing question by mstone
in thread Re: parsing question by sliles

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.