in reply to Help Creating a Code Filter

This is non-trivial with regular expressions, as there's too much statefulness to handle. If you really want to pursue this path, you probably have to use regexes to find individual potential tokens and write your own state machine to handle transitions and backtracking. Once you've done that, you've basically written your own grammar engine.

I recommend the use of a grammar, whether Parse::RecDecent for Perl 5 or perhaps Parrot's PGE/PCT combination. The latter has an implementation of C99 in progress in languages/c99/ that might be instructive.

Replies are listed 'Best First'.
Re^2: Help Creating a Code Filter
by educated_foo (Vicar) on Feb 25, 2008 at 03:40 UTC
    I agree that the Right Thing is to create a grammar and use a real parser, and that Parse::RecDescent is relatively easy to use. However, if this conversion is a one-off thing, or if you can't easily come up with a grammar for the source language (e.g. it's an ad-hoc language), the regexp approach can get the job done. Furthermore, it's easier to ignore parts of the language you don't understand using regexps.

    If you do take the regexp approach, I would suggest doing the conversion in multiple passes, e.g. remove the comments, then convert the small/local constructs, then convert the larger ones. You probably also want to order your patterns from most-specific to most-general. You are lucky that you're running the result through a C++ compiler, since if you mistranslate something, odds are good that the compiler will catch it.