Plankton has asked for the wisdom of the Perl Monks concerning the following question:

Friends,

I am working a project where I need to generate documents from DocBook/XML (xsl) to mulitple formats. I discovered that on Linux, at least, I can use docbook2* (i.e. docbook2html and docbook2rtf), but these commands only take DocBook/SGML (DTD) files as input. I changed one of the DocBook/XML files to a DocBook/SGML by hand and here is the pertinent sdiff ...
<?xml version="1.0"?> | <!DOCT +YPE book PUBLIC "-//OASIS//DTD DocBook V4.2//EN" [ <!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//E | <!ENTI +TY nwalsh "Norman Walsh"> "/usr/share/sgml/docbook/xml-dtd-4.2-1.0-17/docbookx.dtd"> | ]> <article> | <book> <articleinfo> | + <bookinfo> <title>sudo Use and Administration on RH9</ti + <title>sudo Use and Administration on RH9</ti </articleinfo> | + </bookinfo> > <artic +le> <sect1> <title>Introduction</title> <para> + <sect1> <title>Introduction</title> <para> This document describes how to use and administor sudo on RH9 This d +ocument describes how to use and administor sudo on RH9 </para><para> + </para><para> blah blah ... </sect1> + </sect1> </article> </arti +cle> > </book +>
... I know that looks like hell in code tags but I hope it looks better if you click dl/code link :)

I have a ton of these docs to do this too, so hand editting all of them is out of the question. My question is, "Is there some cool XML/XSLT/DTD/something-or-other Perl module I could use to make these changes for me or will I have to "roll-my-own" translator?". You know something awful and buggy like ...
#!/usr/bin/perl -w use strict; my $DTD_VER = "4.2" + + my $firstLine = <>; + + print $firstLine if $firstLine !~ /^\<\?xml/; while(<>) { s/artlicle/book/g; s/DocBook XML V\d\.\d\.\d/DocBook V$DTD_VER/g; ... print $_; } print "</book>\n";

You would think with all the XML hype somebody has done this before or am I just barking up the wrong tree here?

Plankton: 1% Evil, 99% Hot Gas.

Replies are listed 'Best First'.
Re: XML to SGML or xsl vs DTD confused
by iburrell (Chaplain) on Aug 12, 2004 at 18:15 UTC
    Try finding a more recent docbook2html script. There are versions out there which use the XSLT stylesheets to produce XHTML/HTML from DocBook XML.

    Look at http://www.docbook.org/ or http://docbook.sourceforge.net/. You might want to ask on their mailing lists since they have the experience.

      Well I installed the latest docbook-utils but they still only accept SGML_files as input. I only have XML as input. Well here's the awful and buggy script that I'll have to use ...
      bash-2.05b$ cat xml2sgml.pl #!/usr/bin/perl -w use strict; my $doctypePat = quotemeta ( '<!DOCTYPE article PUBLIC "-//OASIS//DTD +DocBook XML V4.1.2//EN"' ); my $fl; $fl = <>; print $fl if $fl !~ /^\<\?xml/; while(<>) { if (/$doctypePat/) { print <<EOT; <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.2//EN" [ <!ENTITY nwalsh "Norman Walsh"> ]> EOT <>; next; } s/article/book/g; if ( /\<\/bookinfo>/ ) { print $_; print "<article>\n"; next; +} if ( /\<\/book>/ ) { print "</article>\n"; print $_; next; } print $_; }

      Plankton: 1% Evil, 99% Hot Gas.
        Monsieur le Plankton,

        I have to go check out the exact docbook xml format and the dtd, and haven't yet for lack of time, since I first read your OP, but..

        I was thinking you could apply an xslt transform to the situation..

        If you have the ability to add a stylesheet line (using a perl oneliner maybe) to the many source files you have, then you could construct the xsl output using a stylesheet which would allow you to define pretty much ... everything.. I'd used this to format nmap output and was amazed how powerful it is

        Click on "subnet-tcp-scan" here:
        http://florian.hastek.net/scans/
        then view the source of that and you see the stylesheet line up top (which you can add to an xml file)

        Then from the same scans/ URL click the stylesheet link to check out the example. Of course you'd have to completely rewrite it for your purposes but after following how this works, you can see how mixing CSS, html, and xslt you can do a lot (basically build the output page you need)

        at W3C and other places there's lots of references for the syntax and functions of xsl.

        I hope this suggestion is on the mark, and I didn't miss your original intent, and again, I have to go check out docbook xml format and see if you can use transforms on it as I've suggested but I'm pretty sure you can...

        HTH

        update- forgot to finish the thought - once you have it in html it might be easier to convert other formats..

        update2 - no, this won't do... I just reread your post, my suggestion will render HTML, if the document (which uses the stylesheet) is viewed in an xml capable browser. Then you'd have to go get the source of the rendered HTML page somehow.. Ah.. how about getting it with LWP and writing out to a new disk file.. but you never did mention the final desired file formats...