http://qs1969.pair.com?node_id=1151415

betacentauri has asked for the wisdom of the Perl Monks concerning the following question:

O Monks, I need to parse some text expressions which are a lot like the arithmetic expressions consisting of +, * and constants, so being able to parse arithmetic expressions would be more than enough. What I need is a syntax tree, where every leaf node is a constant, and the interior nodes are "+" and "*" tokens.

Using Tree::Parser seemed like a good deal to me because I need to do some tree traversal after parsing, and the Tree::Simple object returned by Tree::Parser seems to provide suitable methods.

However, the useNestedParensFilter() filter coming with the module is not quite what I need. I'm lost at how to write a proper parser to be given to setParseFilter(), as the iterator->next() function seems to break lexical units only on parentheses or whitespace, while I need to detect "+" and "*" tokens alone as well.

Can someone explain how to build a syntax tree for {+,*,A..Z,(,)} with Tree::Parser?

TIA

Replies are listed 'Best First'.
Re: Parsing syntax trees
by GrandFather (Saint) on Dec 30, 2015 at 07:16 UTC

    I hadn't played with Marpa before so I thought I'd have a look:

    #!/usr/bin/perl use warnings; use strict; use Marpa::R2; package Element; sub newNode { my ($value, $left, $right) = @_; return bless {left => $left, right => $right, value => $value}, 'E +lement'; } sub doOp { my @params = @_; return newNode(@params[2, 1, 3],); } sub doConst { my @params = @_; return newNode($params[1]); } sub dump { my ($node, $indent) = @_; $indent //= ''; $node->{left}->dump("$indent ") if $node->{left}; print "$indent$node->{value}\n"; $node->{right}->dump("$indent ") if $node->{right}; } package main; my $syntax = <<'SYNTAX'; lexeme default = latm => 1 expression ::= expression '+' term action => doOp | term action => ::f +irst term ::= term '*' factor action => doOp | factor action => ::first factor ::= constant action => doConst constant ~ digits digits ~ [\d]+ :discard ~ spaces spaces ~ [\s]+ SYNTAX my $grammar = Marpa::R2::Scanless::G->new({source => \$syntax}); my $input = '1 * 2 + 3 * 4'; my $root = $grammar->parse(\$input, 'Element'); ($$root)->dump();

    Prints:

    1 * 2 + 3 * 4

    No idea if that helps or not, but it was interesting to have a play.

    Premature optimization is the root of all job security
Re: Parsing syntax trees
by GrandFather (Saint) on Dec 30, 2015 at 08:31 UTC

    or using Tree::Binary:

    #!/usr/bin/perl use warnings; use strict; use Marpa::R2; use Tree::Binary; package Construct; sub newNode { my ($value, $left, $right) = @_; my $node = Tree::Binary->new($value); $node->setLeft($left) if $left; $node->setRight($right) if $right; return $node; } sub doOp { my @params = @_; return newNode(@params[2, 1, 3],); } sub doConst { my @params = @_; return newNode($params[1]); } package main; my $syntax = <<'SYNTAX'; lexeme default = latm => 1 expression ::= expression '+' term action => doOp | term action => ::f +irst term ::= term '*' factor action => doOp | factor action => ::first factor ::= constant action => doConst constant ~ digits digits ~ [\d]+ :discard ~ spaces spaces ~ [\s]+ SYNTAX my $grammar = Marpa::R2::Scanless::G->new({source => \$syntax}); my $input = '1 * 2 + 3 * 4'; my $root = $grammar->parse(\$input, 'Construct'); ($$root)->traverse(sub {print " " x $_[0]->{_depth}, $_[0]->getNodeVa +lue(), "\n"});

    Prints:

    + * 1 2 * 3 4
    Premature optimization is the root of all job security
Re: Parsing syntax trees
by GrandFather (Saint) on Dec 30, 2015 at 03:54 UTC

    It doesn't look to me like Tree::Parser is the right tool, but then I have no real idea what it is you are trying to achieve. Try filling us in on the back story somewhat. At least show us the nature of the syntax you are trying to parse and give us a hint about how you intend to use the parse tree.

    Premature optimization is the root of all job security

      Just WOW! Thank you very much for your responses! I had been away from Perl for a while, had not known about Marpa before.

      If you insist, I'll tell something about this. I had not done this before 1) in the interest of keeping my question short and 2) by fear of being told that this is already done, and way better than I am capable of. Which is exactly what's going to ensue now. Trust me. So be it.

      I am a lazy Computer Networks teacher. I want to have a minimalist language to describe a (computer) network topology and diagram layout. My idea is to cruft a file like

      R:router S1:switch S2:switch H1:host H2:host H3:host R-S1 R-S2 H1-S1 H2-S1 H3-S2 R;(S1;(H1,H2)),(S2;H1)

      This would describe a simple network where a router is linked to two switches, S1 linked to hosts H1 and H2, and then S2, linked to H3.

      The first stances declare topology elements in association with some predefined icons. The middle part tells what's linked to what. The final line describes the diagram layout. In my mind "A;B" means A is graphically above B, and "A,B" means A is to the left of B. This final line is what I am trying to parse.

      After having the syntax tree I want to apply some simple algorithms to get the bounding box dimensions for each subtree and finally draw the graph with properly centered icons and links.

      I was not expecting binary trees as a result, I'll have to digest that. Maybe I'll be fine with them.

      Thank you again!

        Reinventing the wheel as a learning exercise is a fine thing. Reinventing a major wheel as part of getting some other primary task done is generally unproductive and a "bad thing". In this case I think "learning exercise" applies so go for it.

        Providing a succinct version of the back story always helps guide any coding decisions that need to be made. Trying to "keep the question short" almost always ends up with a series of questions to elicit the "unimportant details" that were left out meaning much more work for everyone and a lot longer until you get the help you were looking for.

        You could almost as easily use Tree::Simple instead to get an n-ary tree. So, taking something like your example and using Tree::Simple:

        #!/usr/bin/perl use warnings; use strict; use Marpa::R2; use Tree::Simple; package Construct; sub doMap { my @params = @_; my $child = 'ARRAY' eq ref $params[3] ? $params[3][1] : $params[3] +; return $params[1]->addChild($child) if $child->getNodeValue() ne ' +root'; my @children = $child->getAllChildren(); return $params[1]->addChildren(@children); } sub doMapSib { my @params = @_; my ($lhs, $rhs) = map {'ARRAY' eq ref $_ ? $_->[1] : $_} @params[1 +, 3]; my $root; if ($lhs->getNodeValue() eq 'root') { $root = $lhs; $lhs = undef; } elsif ($rhs->getNodeValue() eq 'root') { $root = $rhs; $rhs = undef; } else { $root = Tree::Simple->new('root'); } $root->addChild($lhs) if $lhs; $root->addChild($rhs) if $rhs; return $root; } my $tail = ''; sub doTail { my @params = @_; $tail = $params[1]; return; } sub doCode { my @params = @_; my $node = Tree::Simple->new("$params[1]$tail"); $tail = ''; return $node; } package main; my $syntax = <<'SYNTAX'; lexeme default = latm => 1 network ::= ident ';' subnet action => doMap | subnet action => ::first subnet ::= sibling ',' subnet action => doMapSib | sibling action => ::first sibling ::= '(' network ')' action => [values] | ident action => ::first ident ::= letter tail action => doCode | letter action => doCode tail ::= digits action => doTail letter ~ [A-Z]+ digits ~ [\d]+ :discard ~ spaces spaces ~ [\s]+ SYNTAX my $grammar = Marpa::R2::Scanless::G->new({source => \$syntax}); my $input = 'R;(S1;(H1,H2,H4,H5)),(S2;H3)'; my $tree = ${$grammar->parse(\$input, 'Construct')}; my $root = Tree::Simple->new('root')->addChild($tree); my $depth = $root->getHeight(); ($root)->traverse( sub { print " " x ($depth - $_[0]->getHeight()), $_[0]->getNodeValu +e(), "\n"; } );

        Prints:

        R S1 H4 H5 H2 H1 S2 H3
        Premature optimization is the root of all job security
Re: Parsing syntax trees
by Anonymous Monk on Dec 30, 2015 at 04:00 UTC