locked_user sundialsvc4 has asked for the wisdom of the Perl Monks concerning the following question:
“Gleaning” is the best word to think of to describe what I want to do here. I’ll be combing through large SAS programs (and SQL queries and yah-yah) looking for very specific needles in the haystacks. The rest I want to ignore. (And, I want the approach to be pretty flexible because some of these programs are “old and nas-s-s-ty.”)
These “needles” are not simply string-patterns. It seems to me that the best way to describe them is in terms of a grammar (or sub-grammar). But it will have a large number of “here be dragons” gaps in it, and these gaps are okay. If I decide that I don’t care about it, I don’t want the program to be tripped-up by it. And I don’t want to have to describe in any sort of detail what it is that I don’t care about. (I know of about 9,000 input files so-far, and there may be many more.)
I can predict that, as time goes on with this project, we’ll find new things that we want to “glean” for. Looking for specific macros, for example, and picking certain things out of them. So, we’ll be re-mining this same vein again and again and again.
Speaking of macros, macro expansion is a whole ’nuther kettle of fish here. I’m going to need to be able to find the various &let statements in the code (e.g. in a “first pass,” then expand the &varname entries as best I can, and then re-parse. (And these guys put macros in their macro-expansions, sometimes six or seven levels deep.)
What I emphatically don’t want to find myself doing is ... effectively, starting over, or playing whack-a-mole on a big ugly wad of (my...) code that is constantly growing bigger and uglier as requirements evolve. And, although I know regexes very well, I really want to try to stay clear of “regex hell.” This is a task for a lexer and a parser.
It is also attracting a lot of high-management attention... which, as we all know, is both a very good thing and a not so very good thing.
Parse::RecDescent is already being used, to very good effect. Other parsers would be much more problematic to deploy. Computing resources are plentiful and fast.
Any thoughts on a general, high-level approach to this sort of problem?
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Using (say) Parse::RecDescent to glean from a file
by moritz (Cardinal) on Oct 10, 2010 at 08:57 UTC | |
|
Re: Using (say) Parse::RecDescent to glean from a file
by locked_user sundialsvc4 (Abbot) on Oct 10, 2010 at 14:52 UTC |