Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

generating a microsoft word doc

by WoodyWeaver (Monk)
on Jan 22, 2018 at 21:38 UTC ( [id://1207707]=perlquestion: print w/replies, xml ) Need Help??

WoodyWeaver has asked for the wisdom of the Perl Monks concerning the following question:

The task is to create a (long) structured document -- it has a bunch of content that is relatively static, and one section that consists of about a thousand question/answer structured blocks that starts off with a table containing a bunch of checkboxes. Contractually, the end result has to be a Microsoft Word document.

The usual approach is just to start writing using a Microsoft Word template. However, it seems to me that MS Word is not conducive to clear thought, particularly when a thousand points are required. When I look at other's work using a similar template (these are "system security documents") I find that often people are non-responsive, and I'm guessing its just that the document has so many moving parts. It also strikes me that its difficult to maintain over time (these are supposed to be 'living documents').

My approach was to store all the complicated structured part in a database back end, and then render into something close to the desired format; then have a copy of the Microsoft Word template with the static stuff filled in, and then just insert the rendered text. I can come reasonably close by using html and checkboxes, then figured that the insert would carry the thing home.

(Ok, so I really like the idea of a database for lots of other reasons: to prepare for CDM and work across multiple SSPs, to be able to have decent change control and external analysis, to be able to run statistics on the language, etc. A flat textual document just isn't a good approach, imho.)

However, it seems like its not doing so -- perhaps because of size/memory issues, perhaps because of openoffice / libreoffice / MS Word conversions, the checkboxes look awful. So, I'm casting about for an alternative -- rendering directly into MS Word.

Another complication is that I work with a linux box, not a Windows box, so I'd rather have a native perl approach rather than Win::OLE. I've used that many times in the past, and could make that work I suppose, it just seems inelegant.

Is there something like EXCEL::Write::XLSX for microsoft word? Or is there a better way?

Replies are listed 'Best First'.
Re: generating a microsoft word doc
by afoken (Chancellor) on Jan 22, 2018 at 22:30 UTC

    "Modern" Word (since 2003, IIRC) uses renamed ZIP archives (*.docx, *.docm) containing a lot of XML, whereas "ancient" Word (*.doc) used a binary mess. Microsoft has documented the file format somewhere, so you should be able to generate that bunch of XML in a renamed ZIP file.

    It might be a little bit easier to use an existing *.docx/*.docm file as template, extract the content XML (using something like Archive::Zip), and patch your content into it using XML::LibXML or similar. After that, create a new ZIP file and rename it to *.docx/*.docm.

    Word can also read and edit HTML.

    If you are on Windows and have Word installed, you may want to try Win32::OLE to open, edit and save a Word file from within perl.

    I would prefer the first or second way, because OLE gets messy and unstable quite fast.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: generating a microsoft word doc
by poj (Abbot) on Jan 22, 2018 at 22:19 UTC

      That's a good idea. The output has a ton of tables and checkboxes, so my first thought was html (used CGI qw/:standard/;) and aside from finding the right glyph for checked and unchecked checkboxes, that is an excellent idea -- and since I was thinking of text, probably should have been first choice. Thanks!

Re: generating a microsoft word doc
by marto (Cardinal) on Jan 23, 2018 at 12:07 UTC
Re: generating a microsoft word doc
by karlgoethebier (Abbot) on Jan 23, 2018 at 11:10 UTC
    "...task is to create a (long) structured document...a Microsoft Word document..."

    This is pain in the ass.

    An alternative might be to use LaTeX and then something like Workflow for converting LaTeX into Open Office / MS Word Format. Which might be pain in the ass as well.

    Best regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

    perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

A reply falls below the community's threshold of quality. You may see it by logging in.
A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1207707]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (1)
As of 2024-04-18 23:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found