Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Maintaining horrible C with Parse::RecDescent

by isotope (Deacon)
on Mar 15, 2002 at 01:43 UTC ( [id://151883]=perlquestion: print w/replies, xml ) Need Help??

isotope has asked for the wisdom of the Perl Monks concerning the following question:

Today I was reminded of why I chose EE instead of CS in college.

I have been asked to undertake a particular project at work. We are currently developing an embedded system, based on work we inherited from our sibling company in Japan. Along with all the hardware documentation and prototypes, they provided us with some sample source code. Unfortunately, the original programmers didn't follow the coding standards we take for granted in the US, and our management decided that we didn't have enough time to develop new software from scratch.

As a result, we're using that old, horribly messy code, with all its pitfalls and limitations. We have brought it to the point where it actually works pretty well, but as we add new features, we run into major hurdles. This is where my project sprang up.

The source code features massive arrays of arrays of arrays of arrays... and so on. These arrays contain inital data describing the termination points in our system (for any of you in the telecommunications industry), of which there are many thousands. In our development, we need to add some new elements to these deeply nested arrays, and I, being dumb enough to admit I know how to program in Perl, have been assigned the task of automating that (rather than spending the next few years of my life doing it by hand).

So, in a fit of insanity I started writing a parser. Fortunately, I noticed the insanity only an hour or two into the process, and started looking around. I read about Parse::RecDescent and realized that this is probably what I need to use. But, first, I'm going to need a C grammar.

In short, I've looked around, but I haven't found any ready to drop into P::RD.

Now, to ask a question. Am I on the right track using Parse::RecDescent and trying to modify a readily available grammar, or is there another approach I should try?


--isotope
http://www.skylab.org/~isotope/

Replies are listed 'Best First'.
•Re: Maintaining horrible C with Parse::RecDescent
by merlyn (Sage) on Mar 15, 2002 at 01:47 UTC
      Well, it appears to be left-recursive, assuming I'm looking at the right example (demo_Cgrammar.pl). As for Inline's grammar.pm, it doesn't look like a complete C grammar, but I'll dig into it further in the morning.

      --isotope
      http://www.skylab.org/~isotope/
Try another tack...
by RMGir (Prior) on Mar 15, 2002 at 14:59 UTC
    Do you REALLY need to parse C?

    Could you instead reformat the tables to some easily parseable format, and maybe surround them with comment markers your perl script could locate?

    That way, you only have to worry about parsing/generating the tables themselves, which is a MUCH simpler problem. You can probably even do it with relatively simple regexen, without having to drag in Parse::RecDescent.

    There's nothing wrong with parser generators, and REALLY nothing wrong with P::RD, but I think it's probably overkill for what you need.

      Could you instead reformat the tables to some easily parseable format

      That's the problem. I have several thousand tables I would have to reformat.

      --isotope
      http://www.skylab.org/~isotope/
        Ouch.

        Any chance you can automate the reformatting somehow?

        Even if you could just put in a "table starts here"/"table stops here" type of marker around the tables, then the rest of your problem gets a lot simpler.

        Given the number of tables you're dealing with, I agree that's probably not trivial. But if you're lucky, there's some indicator you can "grab on to" to start from. How are the tables declared, with [][][]...?

        I think there's got to be a way to get a start on the problem, but without knowing what the code looks like, it's hard to make intelligent guesses.

      (backing up a few levels, because the reply chain was getting pretty deep)

      Are all the tables STP_OLTAPON's? I'm pretty sure you can craft one (very ugly) //x regex to handle one of those tables, and reformat it. But it's not worth the effort if that particular type is only used for this one table...

      That is some scary stuff!

Re: Maintaining horrible C with Parse::RecDescent
by hossman (Prior) on Mar 16, 2002 at 01:51 UTC
    I think RMGir is on the right track .. instead of trying to parse all of the C in the file, find a regexp (or a grammer) that will *JUST* pull out the defintions of all the large arrays.

    His suggestion of /^[A-Z_]+ [a-z_0-9]+ = {/ will probably catch too much (i'm guessing you have plenty of small arrays you aren't worried about) but if there's some "minimum array nest depth" that you are interested in, you can write a regexp to find those...

    $_ = ...; # file contents my $min_depth = 4; while (/([A-Z_]+ [a-z_0-9]+ = \{([^\}]*\{){$min_depth,}[^;]*;)/g) { my $array = $1; .... }
    (completely untested,of course)
      Interesting, hossman!

      I haven't tested it either, but I have this fear that that could be a scarily slow regex.

      Could be worth a try,though.

      Oh, one possible optimization would be to make the quantifier {$min_depth} instead of {$min_depth,}. That would put the regex into the faster [^;]* part quicker. Maybe? It's all guesses :)
      --
      Mike

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://151883]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (6)
As of 2024-03-28 21:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found