http://qs1969.pair.com?node_id=190896

hacker has asked for the wisdom of the Perl Monks concerning the following question:

I have an SGML FAQ for one of my projects, which was originally a static text file. In the interests of keeping the FAQ updated and maintainable, I've converted it to SGML, and it resides in our CVS.

I'd like to be able to take the FAQ, through LWP via either of the two ViewCVS interfaces (which is how it was done when it was static text), convert it to XHTML, and display it on the webpage, dynamically.

Each time a user clicks on the 'FAQ' link on the website, the FAQ will be queried from CVS, converted, wrapped in validated XHTML, and thrown at their browser.

The FAQ generally looks like this:

<!-- ####################### --> <!-- Section 2: Installation --> <!-- ####################### --> <sect1 id="whatplatforms"> <title>What platforms does Plucker run on?</title> <para> The viewer should run on any Palm OS device utilizing version 2.0.4 or higher of Palm OS, while the desktop tools are supported on Linux, Windows, Mac OS X, and OS/2. </para> <para> The desktop tools will probably work on any Unix system with Python installed, but your mileage may vary, so don't get angry if they don't work. If you are able to get it running on a system not listed in REQUIREMENTS then please let us know so that it can be added to the list of supported systems. </para> </sect1> <!-- #################################### --> <!-- Section 2: Installation: END --> <!-- #################################### -->

There are some sections with <itemizedlist> and <listitem> tags in them, so those must be parsed as well. I'd rather stay away from HTML::Template for this particular venture, but I will be moving that direction soon. Right now, 'sub faq {...}' is where the code is taking place.

Has anyone done this? Are there secrets to it? I've read that converting the SGML output to XML first is one way to go. I've looked at SGML::Parser and HTML::Parser, but I'm not sure they can help, without a lot of hand rolling around in the FAQ and parsing out elements into hashes myself.

One other idea I had was to parse it from SGML into a MySQL database, and just keep the text of the FAQ in there. The only problem with this is that the file itself wouldn't be in CVS, available for checkout.. though I could add a pre-checkout command to that, which does the query and writes the data to the SGML file from mysql (slowly getting off-topic here).

What's the best approach here?