Fellow Monks of great Perlness -

I'm in the middle of reducing some SPICE-superset models to run on normal SPICE3, and I've chosen to do it by a series of find-and-replace operations, i.e., at the string level. When I first was given the assignment, I thought I was going to build a symbol-table driven model, but the raw string processing version is coming along quite well.

Given that data representation is so central to all we do, how do we choose whether to parse and build recursive models or to s/construct/simplified/g?

In my case, besides simple variable substitution, the definitions have some of the grotiest inconsistent if/then/else and embedded conditionals I've ever seen. It's based on neolithic FORTRAN, but the superset parser seems to accept a lot of things that would offend both Backus and Naur greatly. I found that it was much easier to code regex / eval pairs to identify these cases line by line than it would have been to store all the data in a tree after identifying them. Below is the code that handles the && and || clauses of the conditionals.
sub testcondition { my $work = trim($_[0]); if ($work =~ /^\((.+)\)$/) { $work = trim($1); } if ($work =~ /^(.+)\|\|(.+)$/) { my($or1,$or2) = ($1,$2); if ((testcondition($or1) eq 'T') || (testcondition($or2) eq 'T')) { return 'T'; } } elsif ($work =~ /^(.+)\&\&(.+)$/) { my ($and1, $and2) = ($1, $2); if ((testcondition($and1) eq 'T') && (testcondition($and2) eq 'T') +) { return 'T'; } } elsif ($work =~ /^(.*[^=<>])[=<>]+)([^=<>].*)$/) #updated { my ($r1, $r2, $r3) = ($1, $2, $3); if (isanumber($r1) && isanumber($r3)) { if (eval("$r1 $r2 $r3")) { return 'T'; } else { return 'F'; } } } return '?'; }
UPDATED: added not-operator 'stops' to conditional breakup match

Replies are listed 'Best First'.
Re: To model or not to model
by kvale (Monsignor) on Apr 14, 2005 at 18:08 UTC
    I think that in both cases (full parser vs. search and replace) you are creating a model of the language that you intend to transform. The search/replace model is just a lot more simple-minded than full parsing :)

    Despite its simplicity, search and replace can work well in some situations. If one has a text file that is, e.g., line oriented, with lines being context free, there is a chance that search and replace operations on each line is all that is needed.

    But most computer languages have hierarchy and some have recursion. This means that small bits of code are in effect context-sensitive: how they are interpreted depends on the surrounding text. So '&&' in a text string "$a && $b" means something very different in perl than a bare $a && $b. In this case, the best thing to do is to parse the whole file into an abstract syntax tree and compile that tree into new code according to your needs. For instance, many folks try to do search and replace on HTML code, which fails in all but the simplest cases. The answer is to use a parser like HTML::Parser to get what you need.

    In your code above, it looks like you are translating Boolean expressions and using perl to 'eval' them. That may work with simple expressions of the form 'a && b', but what about nested expressions? Does Super Spice have the same operator precedence as Perl? Can you tell '&&' embedded in a comment from '&&' as an operator? It gets complicated.

    So for all but the simplest grammars and transformations, it is best to parse.

    -Mark

      Thoughtful commentary, Mark.

      I am working with strings in a hash table which effectively makes this a 'line by line' process. I've already stripped out the comments and combined multiple lines.

      There are nested conditionals, but they are fairly limited in usage and paren-delimited, so they're easy to handle.

      In this case, simple-minded is better. I'm not building a universal translator, just handling a set of files once. You've given me food for thought, though, and if the usage of this code does grow like Topsy, I'll reconsider building a full parse grammar and parse tree.
Re: To model or not to model
by neniro (Priest) on Apr 15, 2005 at 06:50 UTC
    Following kvales post, I'd like to add that you should write a bunch of tests to ensure that the code you produce works the same way as the original code. It's a hell of its own if you generate a sideeffect you haven't considered before and you'll notice it (too) late.
      Agreed. Part of the problem is that the reason for the task is that my employer does not want to spend the $$$ for the fancy analog design workstation software that the process models were created for. A rare instance where lab eggs don't want to spend the peoples' money.
Re: To model or not to model
by Velaki (Chaplain) on Apr 15, 2005 at 08:19 UTC

    When I read your question, one thought hit me, which was

    DOM or SAX?

    Basically, it appeared as if you asking whether to construct a parse tree and subsequently evaluate it, or whether you should shift/reduce tokens on the fly.

    I think the answer is

    It depends.

    As to the level of complexity, I would agree that frequently in all but the simplest cases, you might want to take a look into CPAN, and see if your parsing job can be made easier by one of the many modules there. However, if it is simple enough, then parsing with regexes might be the appropriate solution to your needs. Sometimes it's just easier to use a regex in a while.

    Again, it's a matter of the particular problem, the nature of the data, and your requirements for processing it.

    Just some thoughts,
    -v.
    "Perl. There is no substitute."
      Actually, I'm not even tokenizing it. Each line is a separate equation, and the worst line is a couple hundred characters. Perl-only regex sledgehammers work fine. ;-D

      Normally, I'd be jumping into CPAN as you suggest. I'm learning one thing here in this multipolyglot hodgepodge of systems, though. This particular project is not needed on many architectures, but many are, and keeping a module tree up to date is not a trivial thing here. For example, a co-worker has just spent literally two weeks full time and more just building gcc up to 3.4 on SPARCs running various Solarii variants. Much as I love CPAN and use it frequently, if I can do it native, I do.

      Besides, I learn more that way. ;-D