Re: On modules
by merlyn (Sage) on Aug 18, 2005 at 14:06 UTC
|
TheDamian once explained to me why my attempt to parse all 22 levels of Perl's precedence (at that time) with a recursive descent parser was a really bad idea. I don't fully remember why at this point, but you're obviously stumbling on the same situation. Like everything else, Recursive Descent has plusses and minuses.
I think your meditation is misnamed then. The problem is not using a module: the problem is using the wrong module simply because you're familiar with it. For example, Parse::Yapp makes a different kind of parser (LALR or something like that) that might have solved your issues lickity split.
| [reply] |
|
|
That could be true as well, merlyn. Recursive descent (the methodology) is the right way to do this sort of thing, but what TheDamian's module does isn't exactly classic RD. It is hard to do this in P::RD where it might be better done in a YACC-style parser because of the left recursion issue.
I was as frustrated with using the module itself as I was with the slowness of the program. Part of the reason I rewrote it was so that I could build my own trace functions and examine my own variables and stacks. I felt that understanding TheDamian's code would have taken me as long as the approach I took, and might not have given me the same results. That's the main reason I focused on the use of modules, and it's a broader concern than just with P::RD. As you say, any module and / or approach has its plusses and minuses and these must be evaluated for each problem one approaches.
| [reply] |
|
|
Recursive descent (the methodology) is the right way to do this sort of thing
No, this is apparently the problem. RD is great for some things, but not for this. There's a reason that gcc is not RD, but LALR. Again, I'm a bear of very little brane about this, but when TheDamian explained it to me, it made sense at the time. It doesn't matter how fast you speed it up: something with 22 levels of precedence should be parsed with LALR (if it can), not RD.
| [reply] |
|
|
Re: On modules
by 5mi11er (Deacon) on Aug 18, 2005 at 14:02 UTC
|
Good write up; but I'm wondering if you aren't selling modules a little bit short. I would contend that during the various iterations of solving your problem, you probably continued to understand the problem better and more deeply. So, perhaps a better way of viewing your recent experience is that the P::RD iteration of your solution was a stepping stone on your journey to your current iteration?
-Scott | [reply] |
|
|
Absolutely true. The module worked properly and did solve the problem. I redid it because they want to be able to iterate parameter changes quickly and examine boundary conditions in their simulations.
| [reply] |
Re: On modules
by Courage (Parson) on Aug 18, 2005 at 14:26 UTC
|
I always considered P::RD as a very good excercise for brain, for understanding parsing techniques in general: excellent documentation, excellent tuturial, very deep.
I never thought of practical use of it namely due to its inefficiency.
As another advice, as long as you're deep in parsers, you may consider looking into Haskell: as autrijus showed us all with his marvellous Pugs perl6 implementation, Haskell is extremely good at this. But be warned: haskell is very hard to start playing with :) | [reply] |
|
|
Actually, this latest version is written (shhh!) in Ruby. My group is collectively evaluating Ruby and looking for situations in which to test it. Since the SPICE parser is basically an exercise in logical abstraction, it was ideal. Haskell is free with FreeBSD, so I'll take a gander there, too. :) More tools and understanding are always Good Things.
As 5mi11er suggested, my understanding of the problem space has evolved, and P::RD was an important step along that way.
| [reply] |
|
|
When I was a much younger programmer and had the time to delve into the details, I spent a lot of time reading books about LISP implementation and its underlying theory. Alonzo Church's Lambda Calculus was part of that research. I drove my math profs nuts because none of the rest of the class was prepared to go there, but they wanted to and did. We got a lot of interesting conversation accomplished, but the rest of the class was left very confused and the syllabus didn't get completed. ;)
The nice thing about Perl (or Ruby, for that matter) is that one can do functional programming in it as well as imperative programming. One can look at objects, or one can look at the transition methods and data transformation and information flow. It's all a matter of the level of data abstraction that you model, and the actual language used is immaterial to the understanding embedded in your model.
Haskell does seem to be a very clean way of expressing abstractions. Perhaps, for my parser/equation solver, it would be very suitable. The next stage of our project is to use my solver in conjuction with the analog simulators and statistical correlative software, and that's an iterative situation more suited to the imperative, input-driven nature of a general-purpose language.
Looking at the statelessness of it (Haskell), I can see that that would help us were we to be looking at clustering the solution, but I think that there are enough equations to run with that any real cluster processor I could run this on would be happy solving the equations separately, as there are more equations to solve than there would likely be CPU nodes.
| [reply] |
Re: On modules
by tilly (Archbishop) on Aug 19, 2005 at 03:05 UTC
|
I am willing to make several bets.
First of all, that the experience of having tried to do it with Parse::RecDescent made the rewrite easier to do.
Secondly I'll bet that if Parse::RecDescent used /\G.../gc matches to tokenize rather than nibbling strings, its performance would have been good enough. Maybe not as good as your final version, but good enough to make do.
Thirdly I'll bet that there are more Perl programmers who are inclined to not use modules that would have helped than there are Perl programmers who are inclined to use modules that are not worthwhile. (Though it is certainly possible to err both ways.) | [reply] |
|
|
I was swayed by reports I'd seen from people who swore that P::RD was the cat's meow. You're absolutely right that I gained insight from working with P::RD. Seeing those trace stacks going endlessly through the same fetches and recalculations was mortifying, not edifying! The key to a true RD algorithm is that your early results are saved in the recursive context as you call it again to go deeper. As you come back up the call stack, those results are available.
As for the tokenizer, I actually did a two-pass system, tokenizing the whole string and then working with tokens saved as TTYPE TVALUE. It would have been faster if I had only done one-token lookahead, but the maintenance advantages of separating the tokenizer from the solver outweighed the need for speed. I suspect that the tokenizer is far less a part of the problem than the recalcs. Even with shortcuts, not having access to the actual tokenset stack to replace sub-strings meant that everything was re-fetched again and again.
On your third point, it's a tribute to the languages we have now, especially Perl, that one CAN do so much with the base language. I'm soooo glad to have the choice, and to have the Open Source OSen available to run them, AND to have these multi-gigahertz PCs that labor so under Doze that just fly when set free with open source!!!
| [reply] |
Re: On modules
by samizdat (Vicar) on Aug 19, 2005 at 14:05 UTC
|
tlm has asked me to recap the problem space for everyone's benefit. These equations are extracted from files of SPICE model parameters produced for either Cadence Spectre or HSPICE, which are electronic chip simulators that produce moment-by moment values for how chips behave. Each file contains model parameters for how transistors, capacitors, and such function in a given semiconductor process. The original SPICE simulator that all of these are based on was one of the very first non-OS applications written as Open Source in the 1970's, at UC Berkeley.
The grammar for these is both deep and recursively defined. Equations can have operators that range from ** down to && and ||, plus units scaling factors, and any given non-operator spot might hold a number, a variable, a library function, a user-defined function, an if-then-else, or another whole equation with or without parentheses. Most operators are left-associative, but ** and units are right-associative.
An example might be something like this:
foo = if bar >= 0 then if statistic_p(1, bar/3, 0) > 4 then 1.5u els
+eif statistic_p(1, bar/3, 0) > 2 then 1.2u else 1.0u endif else 0.5u
+endif
On top of all of that, == might be written as =, and elseif and elsif are equally valid. It's a hairy mess of a grammar, and our work is complicated by the fact that these values are not guaranteed to be provably correct, only that they will work on a certain high-dollar workstation we do not have the budget to buy. The code is littered with divide-by-zero's, parameters that are not defined in the code, and assumptions made by the workstation that are not documented. Top it off with the fact that the process we are designing for is being changed on the fly by the vendor, and it becomes quite probable that the values I'm given won't be useful two weeks after, so we really need this parser to fly! | [reply] [d/l] |
Re: On modules
by anonymized user 468275 (Curate) on Aug 22, 2005 at 16:25 UTC
|
I agree with the OP: The (most reasonable) design process consists of first identifying the functional (i.e. language-independent) algorithm that solves the problem in the theoretically most appropriate way.
After that, modules may well exist to fill in some of the coding requirements, but it is often wrong to use a module which misses the mark even slightly. For example, if a module can do everything with a set a files except just one of the things you need, it might be hard to justify doubling the number of passes through all the files just to use the module for most of the requirements, especially if there are hundreds of files adding up to several gigabytes of data to be processed.
| [reply] |