Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: Create MS Word doc in Linux

by jonadab (Parson)
on Mar 09, 2003 at 01:47 UTC ( [id://241475]=note: print w/replies, xml ) Need Help??


in reply to Create MS Word doc in Linux

You say "in Linux", but I assume you mean that you want to create Word documents using Perl, which you happen to be running on Linux.

Word document format is, errr, complex. Others have said, "make an RTF and convert", but RTF of course lacks most of the features you probably want. However, OpenOffice format (sxw) is very full-featured. Better, it's very straightforward and not hard to generate using Perl. Word can't read them, but OO can do the conversion. This saves you from having to deal directly with Word format as such. (You get to deal with XML, which is MUCH easier.)

In general, here's the process I use for automatically generating documents from Perl:

  1. Use OpenOffice to create a basic document that contains all the elements you're going to want, but with only token sample information. Get all the formatting just the way you want it: margin settings, fonts, how much space above each type of paragraph, whether to keep it with the next, all that stuff. Save.
  2. Unzip it into a working directory that your Perl script will have access to.
  3. Copy content.xml and paste it into a string (HEREDOC, possibly) in your Perl script. Break the string into three parts: the parts up to and including the body tags, the actual body, and the closing tags at the end.
  4. Replace the actual body of the document with Perl code that generates the body dynamically. Each type of paragraph/table/whatever will have a style associated with it, which refers to the style information in the other files, but all you're changing is the content, presumably. (This is why you only have to change content.xml. If you wanted to dynamically select font sizes and stuff (rather than using the same ones each time you generate a given type of document) then you would have to rewrite one or more of the other files too.)
  5. After the Perl script rewrites content.xml, all it has to do is zip up the working directory to create an .sxw file. I've been using backquotes to call info-zip, but only because I haven't bothered yet to find the zip module that I'm sure exists on CPAN.
  6. If you want Word format, then open the document in OO and Save As. It is probably possible to script this part too, but I haven't done so.

This could be extended, of course, to automatically generate more than just the text content: it would be trivial to insert images (just copy the image file into your working directory, refer to it in the XML by filename, and zip it right in), but with a little bit of experimentation I'm sure it would not be hard to embed spreadsheets and all sorts of fun stuff. With OO, it's all XML, so automatically generating it from Perl is a breeze. It's not really any harder than writing a CGI script to generate (valid) HTML.

The only bummer with this approach is that using OO to do the conversion to Word format is a fairly heavyweight thing in terms of system resources. OO has a substantial memory footprint. You wouldn't want to do this on an old Pentium/90 that you've installed Linux on to use as a web server, for example. (You could generate the OO document on there, but you wouldn't want to run OO on there to do the conversion.) 128MB of RAM is recommended, IIRC, for running OO. Also, you don't mention the frequency or speed with which you need to spit out documents. If this is the kind of thing where you're handling web requests and returning a doc to a remote client, then the overhead of OO's load time will be too great. OTOH if you're generating a report that you want to print to give to your boss, you're going to have to load the document in a word processor anyway (to print it), so nothing lost.


for(unpack("C*",'GGGG?GGGG?O__\?WccW?{GCw?Wcc{?Wcc~?Wcc{?~cc' .'W?')){$j=$_-63;++$a;for$p(0..7){$h[$p][$a]=$j%2;$j/=2}}for$ p(0..7){for$a(1..45){$_=($h[$p-1][$a])?'#':' ';print}print$/}

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://241475]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (7)
As of 2024-04-19 09:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found