Do you have Word? If so, it may be possible to use Word's alleged ability to read HTML files to get what you want. I say "alleged", because Word will only read HTML files of a certain format. I don't speak HTML, so I haven't managed to get any code working to do this, but the automation of Word is not that difficult. I opened a Word instance and saved a blank document as HTML. This generated most of the code below, which is nearly working, i.e. it doesn't work. The problem seems (remember, I don't speak HTML) to have something to do with there being head and body tags from both the existing HTML document and the word top and tail. The temp file created therefore gets rejected by Word when it tries to open it. If anyone knows enough about HTML to get an HTML file into what Word will accept, this might be a way forward for you - if you have Word!

Regards,

John

use strict; use warnings; use Win32::OLE; use Win32::OLE::Const 'Microsoft Word'; my $htmltop = "<html xmlns:o=\"urn:schemas-microsoft-com:office:office +\" xmlns:w=\"urn:schemas-microsoft-com:office:word\" xmlns=\"http://www.w3.org/TR/REC-html40\"> <head> <meta http-equiv=Content-Type content=\"text/html; charset=windows-125 +2\"> <meta name=ProgId content=Word.Document> <meta name=Generator content=\"Microsoft Word 10\"> <meta name=Originator content=\"Microsoft Word 10\"> <link rel=File-List href=\"Blank_files/filelist.xml\"> <!--[if gte mso 9]><xml> <o:DocumentProperties> <o:Author>Davies</o:Author> <o:LastAuthor>Davies</o:LastAuthor> <o:Revision>1</o:Revision> <o:TotalTime>1</o:TotalTime> <o:Created>2011-02-01T14:47:00Z</o:Created> <o:LastSaved>2011-02-01T14:48:00Z</o:LastSaved> <o:Pages>1</o:Pages> <o:Lines>1</o:Lines> <o:Paragraphs>1</o:Paragraphs> <o:Version>10.2625</o:Version> </o:DocumentProperties> </xml><![endif]--><!--[if gte mso 9]><xml> <w:WordDocument> <w:Compatibility> <w:BreakWrappedTables/> <w:SnapToGridInCell/> <w:WrapTextWithPunct/> <w:UseAsianBreakRules/> </w:Compatibility> <w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel> </w:WordDocument> </xml><![endif]--> <style> <!-- /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-parent:\"\"; margin:0cm; margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:Arial; mso-fareast-font-family:\"Times New Roman\"; mso-bidi-font-family:\"Times New Roman\";} \@page Section1 {size:595.3pt 841.9pt; margin:72.0pt 90.0pt 72.0pt 90.0pt; mso-header-margin:35.4pt; mso-footer-margin:35.4pt; mso-paper-source:0;} div.Section1 {page:Section1;} --> </style> <!--[if gte mso 10]> <style> /* Style Definitions */ table.MsoNormalTable {mso-style-name:\"Table Normal\"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-parent:\"\"; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:\"Times New Roman\";} </style> <![endif]--> </head> <body lang=EN-GB style='tab-interval:36.0pt'>"; my $htmltail = "</body> </html>"; my $infile = shift; my $tempfile = $infile; $tempfile =~ s/\./tmp\./; my $outfile = $infile; $outfile =~ s/.html?/.doc/; my $fhi; my $fht; open($fhi, "<", $infile) or die "Can't open input file"; open($fht, ">", $tempfile) or die "Can't open temp file"; print {$fht} $htmltop; while (my $line = <$fhi>) { print {$fht} $line; } print {$fht} $htmltail; close $fhi; close $fht; my $word = Win32::OLE->new('Word.Application'); my $doc = $word->Documents->Open($tempfile) or die "Dying $!"; $doc->SaveAs({FileName => $outfile, FileFormat => wdFormatDocument}); $doc->close(); $word->Quit();

In reply to Re: Approaches to produce word docs by davies
in thread Approaches to produce word docs by LanX

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.