a timid approach to parsing arithmetic expressions in perl

I wrote this code out of the interest of implementing an
ideea which I discussed with a former professor,it turned out to be
an interesting.
I like perl for this matter because it allows one to build a rapid prototype of
something that could be built in a langauge where it will run faster,for example C.
So I would be interested when knowing an algorithm to quickly sketch it and make it work
in Perl to be able to see all the special cases that might appear and have all corners
of it visualized before starting to implement in...for instance C.
About the algorithm,I was interested in a way to evaluate an arithmetical expression
which contained or did not contain paranthesis and that the code produced after
developing such a program should be elegant.
I heard about some ways which involved infix to postfix notation conversion and also rpn
(reverse polish notation) and expression trees and I have not went on that road.
Please provide references to articles/books which describe similar techniques if possible
Thank you for any suggestions you might have about the code

Short description of the algorithm:
1) give each operator/operand a priority based on it's depth and on a base priority each has.
2) take the biggest priority operand and the 2 operands adjacent to it
3) evaluate the 3 elements extracted at step 2 and put in the array at the place
where they were the result of the calculation and the results's priority is decreased by
a depth_bonus because it's paranthesis have disappeared.
4) repeat step 2 until the array has just one element(which is the result of the evaluation)

Below is the code:

#!/usr/bin/perl 
use strict;
use warnings;
use Data::Dumper;
# this code can be re-written without any regex/eval,their use is pure
+ly for shortening the code


use constant BASE_PRIORITY => 
{
    NUMBER        => 2,
    OPEN_PARA    => 8,
    CLOSED_PARA => 8,
    ADD            => 4,
    SUB            => 4,
    MUL            => 7,
    DIV            => 7,
    POW            => 9,
};

use constant DEPTH_BONUS => 10;


#my $exp = "5*(12/(32+4))-10";
my $exp = "3**(6-1*4)**2";

#my $exp = -3+(-1-2);
#my $exp = (3-(4+6))/2;

my $depth = 0;
my @terms;



sub delete_at { # delete the term at the index equal to the parameter 
+given to this function
    return shift @terms if $_[0] == 0;
    return pop @terms if $_[0] == (@terms-1);
    my $ret = $terms[ $_[0] ];
    @terms = (
        @terms[0..$_[0]-1],
        @terms[$_[0]+1..@terms-1],
    );
    return $ret;
}

sub insert_at { # inserts a term exactly before the index given as par
+ameter
    @terms = (
        @terms[0..$_[0]-1],
        $_[1],
        @terms[$_[0]..@terms-1],
    );

}


sub firstPass {# this builds up the @terms for later use
    while( $exp =~ s/^(\-?\d+|\*\*|\*|\/|\+|\-|\(|\))// ) {
        my $type=$1;
        my $term=$1;
        if( @terms>0 &&  $terms[@terms - 1 ]->{type} eq 'NUMBER' && $t
+erm =~ /\-\d+/ ) {
            #see if we currently have a negative number,see if before 
+we had a number
            #this means that we're on the wrong track and that - is ac
+tually an operator here
            #and not the sign for a negative number
            $exp=$term.$exp;
            $exp=~s/^-//;
            $type = "SUB";
            $term = '-';
            print "EXP $exp \n";
        } else {
            $type =~ s/\-?\d+/NUMBER/;
        };
        $type =~ s/\(/OPEN_PARA/;
        $type =~ s/\)/CLOSED_PARA/;
        $type =~ s/\+/ADD/;
        $type =~ s/\*\*/POW/;
        $type =~ s/\*/MUL/;
        $type =~ s/\//DIV/;
        $type =~ s/\-$/SUB/;
        my ($is_term_para) = $type =~ /OPEN_PARA|CLOSED_PARA/; # this 
+flag will tell us wether the term is or is not a paranthesis 
        $depth++ if $type eq 'OPEN_PARA';  # if we encounter an open p
+aranthesis we increase depth
        $depth-- if $type eq 'CLOSED_PARA';# closed paranthesis we dec
+rease it
        push @terms,
        {
            type        => $type,
            term        => $term,
            priority    => BASE_PRIORITY->{$type} + $depth*DEPTH_BONUS
        }
        unless $is_term_para; # we leave out the paranthesis because w
+e no longer need them(their purpose
                              # was to provide priority information fo
+r us)
    };
}


sub getPrioritary { # gets most prioritary 3 elements in the current e
+xpression

    my @sIndexes = sort { -1 * ( $terms[$a]->{priority} <=> $terms[$b]
+->{priority} ); } 0..(@terms-1) ;

    my $i = 0; # the index in @sIndexes
    my $middleMaxPrio = $sIndexes[$i];

    while( $terms[$middleMaxPrio]->{type} eq 'NUMBER' ) { # if our sel
+ected maximum priority element is not a number search for the next mo
+st prioritized that is a number
        print "[DEBUG] $terms[$middleMaxPrio]->{type}";
        $middleMaxPrio = $sIndexes[++$i];
    };

    my $leftNearMax      = $middleMaxPrio -1; # we take the left of $m
+iddleMaxPrio
    my $rightNearMax  = $middleMaxPrio +1; # and the right of it , bec
+uase these two are surely operands

    my @selectedTerms = map { delete_at $_  } ( $rightNearMax , $middl
+eMaxPrio , $leftNearMax ); # we delete them in inverse order to not a
+lter the stack badly

    return {
        selected    => [ @selectedTerms ],
        insertIndex    => $leftNearMax,
        maxPriority    => $selectedTerms[1]->{priority}, # the middle 
+element will be surely an operator so it will have maximum priority
    };
}


#---------------------------------------------------------------------
+------------------------------------------------

firstPass;

while( @terms > 1 ) {
    print "DEBUG ".Dumper [@terms];

    my $data = getPrioritary;
    my @elems = map { $_->{term} } @{ $data->{selected} };
    my $expr = sprintf "%s %s %s", reverse @elems;
    my $result = eval($expr); # the eval here has just been used for s
+hortening the code,it could have very well been a simple switch on $e
+lems[1]

    print "DEBUG [$expr]\n";

    insert_at 
    $data->{insertIndex},
    {
        type    => 'NUMBER',
        term    => $result,
        priority=> $data->{maxPriority} - DEPTH_BONUS #we have calcula
+ted what was probably a paranthesis therefore we substrac a depth_bon
+us
    };
    <>;
};

print "RESULT :".$terms[0]->{term};
[download]

UPDATE: the code was updated due to ikegami

Comment on a timid approach to parsing arithmetic expressions in perl Download Code

Replies are listed 'Best First'.
Re: a timid approach to parsing arithmetic expressions in perl by MidLifeXis (Monsignor) on Jul 25, 2008 at 18:18 UTC
Short description of the algorithm: give each operator/operand a priority based on it's depth and on a base priority each has. take the biggest priority operand and the 2 operands adjacent to it evaluate the 3 elements extracted at step 2 and put in the array at the place where they were the result of the calculation and the results's priority is decreased by a depth_bonus because it's paranthesis have disappeared. repeat step 2 until the array has just one element(which is the result of the evaluation) On the surface, it looks like this algorithm should work, however, it seems like you are doing more work than is necessary, and the algorithm reduces basically to an infix-to-postfix conversion combined to a postfix calculation but with a larger big o complexity. Since you are able to parse and calculate postfix with a single pass through the data and a stack, and convert from infix to postfix with a single pass and stack, I am not certain what this buys you. As far as your implementation, for educational purposes, go crazy. On my initial scan, I would have concerns with the element selection routine, but it appears that it should work. For performance, I would be concerned with the repeated sorts. If this is not for educational reasons, then I would seriously consider looking at the algorithms listed above, a parse tree, and a postorder traversal of that tree, as those are probably able to be implemented with less code and less complexity. --MidLifeXis	[reply]
Re: a timid approach to parsing arithmetic expressions in perl by AltBlue (Chaplain) on Jul 26, 2008 at 17:17 UTC
Looking and your code and algorithm description, I got the same impressions as MidLifeXis: the algorithm is simple (yet limited), correct, but slow. This is a classic problem that computer science students meet many times during their classes, each time using a different algorithm / approach to solve it, and the fellow monks already provided you many pointers (CPAN could be another). Implementation wise, I'm sorry to say that yours looks like a mechanical line-to-line translation from that other language. Here are a few suggestions: don't interpolate when it's not necessary, e.g. `print "DEBUG ".Dumper [@terms];` vs. `print 'DEBUG '.Dumper [@terms];` don't concatenate when it's not necessary, just use the list e.g. `print "DEBUG ".Dumper [@terms];` vs. `print "DEBUG ",Dumper [@terms];` don't clone (duplicate data) when it's not necessary, just pass it by reference e.g. `print "DEBUG ".Dumper [@terms];` vs. `print "DEBUG ".Dumper \@terms;` rewriting regular expression in a more readable manner (using `/x` modifier) could help you see easier through the code, e.g. `s/^(\d+\|\\|\/\|\+\|\-\|$\|$)//` vs. `s{ ^ ( \d+ \| \ \| / \| \+ \| - \| $ \| $ ) }{}x` leading to `s{ ^ ( \d+ \| [ ( ) * / + - ] ) }{}x` use Perl's built-in functions unless you have a real reason to avoid them, e.g. `sub delete_at {...}` vs. `sub delete_at { return splice @terms, $_[0], 1 }` or `sub insert_at {...}` vs. `sub insert_at { return splice @terms, $_[0], 0, $_[1] }` don't comment obvious statements, let code express itself e.g. `$depth++ if $type eq 'OPEN_PARA'; # if we encounter an open paranthesis we increase depth` use the index of the last element of an array when that's what you need e.g. `0..(@terms-1)` vs `0 .. $#terms` sort knows how to sort in reversed order, there's not need to clutter your code e.g. `sort { -1 * ( $terms[$a]->{priority} <=> $terms[$b]->{priority} ); }` vs. `sort { $terms[$b]->{priority} <=> $terms[$a]->{priority} }` or - if you are following Damian Conway's advices (from Perl Best Practices) - plug a verbose reverse: `reverse sort { $terms[$a]->{priority} <=> $terms[$b]->{priority} }` avoid misleading your readers/maintainers: in `getPrioritary` you "needed to" reverse the results (`@selectedTerms`), which then you have to reverse them back in your "main" loop to make them usable. Better keep your workarounds as self contained as possible, i.e. `my @selectedTerms = reverse map { delete_at $_ } ...` (Of course, in that case there was no need for such workaround, a splice would have sufficed) Finally, here's a possible rewrite of your code: #!/usr/bin/perl use strict; use warnings; { my $OP = { # operator => weight '+' => 4, '-' => 4, '' => 7, '/' => 7, '' => 8, '(' => 9, ')' => 9, }; my $depth_bonus = 1 + ( sort { $b <=> $a } values %{$OP} )[0]; my $ops_re = join q{\|}, map { quotemeta } sort { length $b <=> length $a } keys %{$OP}; sub eval_expr { my $expr = "@_"; my @terms; my $depth = 0; while ( $expr =~ s{^ \s ( \d+ \| $ops_re ) }{}xo ) { if ( $1 eq '(' ) { $depth++; } elsif ( $1 eq ')' ) { $depth--; } else { push @terms, [ $1, exists $OP->{$1} ? $OP->{$1} + $depth * $depth +_bonus : 0 ]; } } while ( @terms > 1 ) { my ($op_idx) = sort { $terms[$b]->[1] <=> $terms[$a]->[1] } 0 .. $#term +s; my @elems = map { $_->[0] } @terms[ $op_idx - 1 .. $op_idx + + 1 ]; splice @terms, $op_idx - 1, 3, [ eval("@elems"), 0 ]; } return $terms[0]->[0]; } } for ( '3 - ( 4 + 5 )', '1 + 2 * 3', '( 1 + 2 ) * 3', ' 5 2 - 2 3', ) { my $evaled = eval("$_"); my $expred = eval_expr($_); printf "%40s = %-20s %s\n", $_, $expred, $evaled; } [download] As you may notice, while it preserves you original approach / algorithm / didacticism, it also tries to be more flexible and simplifies some steps, observing that operands do not need priority/weight. 'HTH	[reply] [d/l] [select]
Re: a timid approach to parsing arithmetic expressions in perl by FunkyMonk (Bishop) on Jul 25, 2008 at 19:03 UTC
It's been many, many years since I did anything with expression evaluation and compiling, but yours does seem rather complicated. For clarity, I always liked recursive descent. As a simple example here's an evaluator for expressions with integer add, subtract, multiply and divide. update: Now does unary minus, real numbers and parentheses Read more... (3 kB)	[reply] [d/l]
Re: a timid approach to parsing arithmetic expressions in perl by ikegami (Patriarch) on Jul 26, 2008 at 12:37 UTC
I was testing something else, when I discovered an expression that generates tons of warnings: `my $exp = 3-(4+5);` [download] I'd also be interested in seeing a right-associative operator added. `my $exp = 432; # = 4**9 = 262144` [download]	[reply] [d/l] [select]
Re^2: a timid approach to parsing arithmetic expressions in perl by spx2 (Deacon) on Jul 26, 2008 at 15:25 UTC
Hi ikegami and thank you very much for the interest in this, Using minor modifictions I've implemented also POW for this evaluator I've also added as a test the expression "3*(6-14)**2" which should eval to 81. About the warnings , the compiler already evaluated the expression to -6 and the bug was caused by the fact that negative numbers were not yet recognised. Now both of these are fixed.	[reply]
Re^3: a timid approach to parsing arithmetic expressions in perl by ikegami (Patriarch) on Jul 27, 2008 at 07:57 UTC
`3*(6-14)2` is a bad test since `(32)2` happens to equal `3(22)` When I use the `342` I had previously mentioned, your code returns `6561` (`(34)2`) instead of `43046721` (`3(42)`). `>perl -le"print 34**2" 43046721` [download]	[reply] [d/l] [select]
Re^3: a timid approach to parsing arithmetic expressions in perl by spx2 (Deacon) on Jul 26, 2008 at 16:02 UTC
hmm,seems that the right associative operator must be parsed in a different way... the problem about evaluating "342" still persists ... however if paranthesis are employed and we re-write like this "3(42)" it will be evaluated correctly. uhm,the only way I can see this as beeing done is by using a stack and putting all 3,4,2 on it if the operator ** is found and afterwards start emptying the stack by taking two of them,evaling putting them back and so on.	[reply]
Re: a timid approach to parsing arithmetic expressions in perl by TGI (Parson) on Aug 01, 2008 at 17:02 UTC
I think if you space out your priorities a bit more, right associative operators could be handled by adding a bonus to their weight: `As written: String: 3 2 4 Weights: 2 9 2 9 2` [download] Now change the algorith as follows: Multiply normal weights by some factor, say 10. Multiply level bonus by the same value. Add 1 to priority for each occurrence of a right associative operator. Swap priorities of parens and POW. `String: 3 2 4 Weights: 20 80 20 81 20 More complex string: String: 3 ** ( 6 - 1 * 4 ) 2 Weights: 20 80 90 120 140 120 170 120 90 81 20 String: 3 ( 6 - 4 ) 2 Weights: 20 80 90 120 140 120 90 81 20 String: 3 ( 2 ) 2 Weights: 20 80 90 120 90 81 20 String: 3 2 2 Weights: 20 80 20 81 20 String: 3 4 Weights: 20 80 20 String: 81 Weights: 20` [download] Obviously, the weakness of this approach is that if you have too many right associative operators at a level, your priorities collide. You can make "too many" arbitrarily large by using a large multiplier for the original priority scheme. I'll leave actual implementation up to you. TGI says moo	[reply] [d/l] [select]