suaveant has asked for the wisdom of the Perl Monks concerning the following question:

I am working to replace a format generator for financial reports which is dynamic based on format definitions.

In the past this was pure Perl/MySQL with mappings in the format definitions pointing either to MySQL table items or special perl functions to handle the data. Much of the calculations/login for report items were done by building MySQL statements with things such as +-*/ IFNULL IF CONCAT etc... Anything done in Perl was a special case that was coded by me rather than defined in the format editor.

The new format generator is to be more flexible. I am implementing an abstraction layer for the database so that I can just change a single mapping if a table changes rather than every format (really should have done that the first time around, sigh). I am also adding the possibility of other data sources, such as a system in another department we are able to call into.

In doing this, the old way of building everything into MySQL queries just isn't going to cut it, since some of this data won't be in MySQL and I don't want to put it into a temp table each time I run a report.

My thought, since the people making the format definitions are already familiar with the MySQL statement layout and basic functions they were using, is to parse this format into something that I can translate and use to do calculations in perl OR MySQL depending on where the data is coming from.

I need to build this into a logic engine that can be passed the data items and quickly generate the calculated value on a row by row basis.

So... that was the background... here is the real question. How would some of you go about this? I want to get it right this time around. Any help is appreciated.

My thoughts so far: I could write my own parser, but I was thinking it would be a good time to learn something like Parse::Recdescent or YAPP or something similar. I have never used any of these before and while I was looking I was having trouble figuring out how to do order of precedence.

Here are examples of what a statement might look like... Anthing in the form of {n} maps back to a data item in the abstraction. Some abstractions will have arguments (this is necessary since things like 1-n tables need some sort of wrapper to get the proper single field) they would look like {n(args)}.

1 + {3} * 5

#IF(test,then,else)
IF({3} > NOW()-INTERVAL 1 DAY, {3}, 'Some text')

                - Ant
                - Some of my best work - (1 2 3)

  • Comment on Parsing and executing a psuedo-language

Replies are listed 'Best First'.
Re: Parsing and executing a psuedo-language
by samtregar (Abbot) on May 25, 2005 at 17:21 UTC
    I recommend you give Parse::RecDescent a try, as long as parsing performance isn't likely to be a bottleneck (it's frickin' slow). If you're completely new to parsing you might pick up a copy of O'Reilly's lex & yacc. That was my source-book on a few parsing missions and it served me well.

    -sam

      how would one introduce order of operations? In the language definition, or somewhere else? No order of operations
      1+2*3 = 9

      Proper order of operations 1+2*3 = 7

      Others may call it order of precedence... I learned it as PEMDAS in school, parens, exponents, mult, div, add, subtract

                      - Ant
                      - Some of my best work - (1 2 3)

        This is running in a FastCGI so if the module takes a little while to initiate, that is fine... the language rules could be loaded at initialization too, I assume... If it takes a second or so to parse after all that, that is fine...

        I think Parse::RecDescent should be fine then.

        how would one introduce order of operations? In the language definition, or somewhere else?

        I don't know what you mean by "order of operations".

        -sam

Re: Parsing and executing a psuedo-language
by djohnston (Monk) on May 25, 2005 at 17:53 UTC
    I've been rather successful using a simple and efficient Polish notation (aka: prefix notation) parsing algorithm under which precedence is not an issue. Operators precede their operands, so 1 + {3} * 5 is expressed as + 1 * {3} 5.

    I've written a relatively light-weight user-extensible token parser based on Polish notation (complete with if/elsif/else, switch/case, try/catch, etc...) which, if nothing else, demonstrates its potential as the basis for a scripting language.

      This is more or less a "me too" type post, but I've also written a simple token based interpreter. Note that it's really very easy to execute prefix notation code (have an operator stack and a data stack, push operator, push data, pop operator, pop data, push result) and it's really fairly easy to go from infix to prefix.
Re: Parsing and executing a psuedo-language
by terce (Friar) on May 25, 2005 at 16:24 UTC
    Depending on the platforms the other datasource(s) run on, and on how up to date these reports need to be, my instinct would be to replicate all the data into some sort of warehouse (on MySQL if appropriate) and let the users carry on using the syntax they know.

    This would seem to be far simpler than the parser you're planning (which seems kind of like re-inventing the ANSI SQL wheel), although I appreciate it may not be appropriate for your environment.

    Anyway, good luck with it.
      Well... the parser isn't nearly so complex as SQL... it is only parsing a single statement. In the end they want us to move all our data retrieval to the other department's system. Data warehousing it is basically what we have been doing.

      That and sometimes the data is used so infrequently that it isn't really worth storing it locally and wasting all the space...

      So, in the long run this is the best way, as the other system gets more of the data we need and improves the speed of their system, more and more will be retrieved from them directly. Edit By single statement I mean of course the part that makes up a single column return in an SQL query... so really it is just simple math and function calls, there is a lot more to SQL than that :)

                      - Ant
                      - Some of my best work - (1 2 3)

Re: Parsing and executing a psuedo-language
by phaylon (Curate) on May 25, 2005 at 16:13 UTC
    I often used Finite State Machine models for this kind of stuff and it worked pretty good. Sorry for not giving more help, but I'm very, very (, very) sure there are plenty of monks in here, that can help you better. So I'll just go with that tip :D

    Good luck and have phun.

    Ordinary morality is for ordinary people. -- Aleister Crowley