Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

SGML FAQ to HTML (or XML? or SQL?)

by hacker (Priest)
on Aug 17, 2002 at 17:53 UTC ( [id://190896]=perlquestion: print w/replies, xml ) Need Help??

hacker has asked for the wisdom of the Perl Monks concerning the following question:

I have an SGML FAQ for one of my projects, which was originally a static text file. In the interests of keeping the FAQ updated and maintainable, I've converted it to SGML, and it resides in our CVS.

I'd like to be able to take the FAQ, through LWP via either of the two ViewCVS interfaces (which is how it was done when it was static text), convert it to XHTML, and display it on the webpage, dynamically.

Each time a user clicks on the 'FAQ' link on the website, the FAQ will be queried from CVS, converted, wrapped in validated XHTML, and thrown at their browser.

The FAQ generally looks like this:

<!-- ####################### --> <!-- Section 2: Installation --> <!-- ####################### --> <sect1 id="whatplatforms"> <title>What platforms does Plucker run on?</title> <para> The viewer should run on any Palm OS device utilizing version 2.0.4 or higher of Palm OS, while the desktop tools are supported on Linux, Windows, Mac OS X, and OS/2. </para> <para> The desktop tools will probably work on any Unix system with Python installed, but your mileage may vary, so don't get angry if they don't work. If you are able to get it running on a system not listed in REQUIREMENTS then please let us know so that it can be added to the list of supported systems. </para> </sect1> <!-- #################################### --> <!-- Section 2: Installation: END --> <!-- #################################### -->

There are some sections with <itemizedlist> and <listitem> tags in them, so those must be parsed as well. I'd rather stay away from HTML::Template for this particular venture, but I will be moving that direction soon. Right now, 'sub faq {...}' is where the code is taking place.

Has anyone done this? Are there secrets to it? I've read that converting the SGML output to XML first is one way to go. I've looked at SGML::Parser and HTML::Parser, but I'm not sure they can help, without a lot of hand rolling around in the FAQ and parsing out elements into hashes myself.

One other idea I had was to parse it from SGML into a MySQL database, and just keep the text of the FAQ in there. The only problem with this is that the file itself wouldn't be in CVS, available for checkout.. though I could add a pre-checkout command to that, which does the query and writes the data to the SGML file from mysql (slowly getting off-topic here).

What's the best approach here?

Replies are listed 'Best First'.
•Re: SGML FAQ to HTML (or XML? or SQL?)
by merlyn (Sage) on Aug 17, 2002 at 18:47 UTC
      I've been hearing this more and more. For a long time I wondered if it was just me. It seemed like such a great idea.
      ()-()
       \"/
        `                                                     
      
Re: SGML FAQ to HTML (or XML? or SQL?)
by Zaxo (Archbishop) on Aug 17, 2002 at 18:15 UTC

    Since the source is docbook xml, XSLT would be a good place to start. You can define transformations to any format you like.

    Added:
    Since your FAQ probably only changes at intervals, It would improve response to keep static copies of the transformed versions, perhaps with make run from a cron job to handle updates.

    The book Perl & XML shows two major approaches to transformations in perl, xslt and callbacks on parsed xml trees, which you suggested on cb you might prefer.

    After Compline,
    Zaxo

Re: SGML FAQ to HTML (or XML? or SQL?)
by mdillon (Priest) on Aug 17, 2002 at 18:31 UTC
    If you are going to use SGML (or XML as Zaxo suggests), you need to make sure that your document is well-formed. The version currently in CVS has a bunch of problems that can be found either with "nsgmls -s faq.sgml" from the Jade package, or by changing to an XML DTD and using xmllint from LibXML.

    Since you aren't actually using any SGML syntax that is not part of XML, I would recommend using XML, since the tool landscape is currently much more full for XML than SGML. Using XSLT, I would probably combine it with mod_xslt which translates XML to whatever, on the fly. It also has caching capabilities. For the generation of HTML output, using the latest version of the Docbook XSLT stylesheets should be fine.

    If you want to do it in Perl, have a look at XML::LibXML and XML::LibXSLT in conjunction with libxml2. You might want to come up with your own caching mechanism, in that case.

    A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://190896]
Approved by mdillon
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2024-04-19 06:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found