supergiantrobot has asked for the wisdom of the Perl Monks concerning the following question:

I have a text translation problem to solve. That is, I want to take some text with simple markup and produce other text in another form of markup. The original text looks something like this:
HEAD: LAMP Post DECK: Burn the Bulb Brighter AUTHOR: Joe Smith This is I<some> sample text. It uses a few pieces of markup such as C< +courier>, C<I<courier italics>>, B<bold>, and I<italics.> Some time there is C< long sections of code > or I< Long quotes that appears in blocks > There can also be figures, such as I<Figure One.> [ BEGIN FIGURE ONE: This is a caption with I<italics> ] [ INSERT sample.jpg ] [ END FIGURE ONE ] Etc. BIO: John Smith


For sake of discussion, imagine I want to produce HTML (but I also want to be able to produce Quark and InDesign markup, too). I definitely want to maintain a map of original tags to new tags, and have that list be dynamic so I can emit a variety of formats.

I have something working with Parse::RecDescent. I chose it because of the possibility of embedded tags, such as C<I<courier italics.>> However, I am wondering if there is some other module I should consider using? Text::Balanced? A modified HTML parser?

Suggestions? Martin

Replies are listed 'Best First'.
Re: Best parsing module to use?
by kvale (Monsignor) on Apr 19, 2004 at 15:41 UTC
    Given the language element
    [ INSERT sample.jpg
    I think Text::Balanced will not be a good solution because the square brackets are not always balanced in your example. The markup doesn't really look like HTML either; there are no closing or empty tags anywhere. As you demonstrate, there is the possibility of arbitrary nesting at least a few levels deep, so doing the whole thing with a regex would be a mess. If anything, this format has a resemblance to POD.

    If I were approaching your parsing problem, I would create a grammar first. I would then either use Parse::RecDescent if speed is not an issue, or create a hand-crafted parser if it was.

    -Mark

      I missed some of the right brackets in my sample code. My apologies.

      Everything should be balanced.

      I wonder if switching to POD would help... but, nesting might make that difficult.
Re: Best parsing module to use?
by samtregar (Abbot) on Apr 19, 2004 at 18:31 UTC
    I have something working with Parse::RecDescent.

    That's what I would use, unless parser performance is important. Any reason not just keep using it?

    -sam