To convert a word doc to text all you need to do is use Word to do it (these subs are straight out of a Proforma document management sytem I worte a while back):

sub save_doc_as_text { my ( $infile, $outfile ) = @_; require Win32::OLE; my $word = Win32::OLE->new( 'Word.Application', sub {$_[0]->Quit;} + ); error( "Can't create new instance or Word Reason:$Win32::OLE::Last +Error" ) unless $word; $word->{visible} = 0; my $doc = $word->{Documents}->Open($infile); error( "Can't open $infile, Reason:$Win32::OLE::LastError" ) unles +s $doc; # wdFormatDocument wdFormatText wdFormatHTML $doc->SaveAs( { FileName => $outfile, FileFormat => $wdFormatText +} ); $doc->Close; undef $doc; undef $word; }
Note $wdFormatText is the standard constant which you can get from Win32::OLE::Const but as this plays havoc with warnings I tend to hard code it. It has a value of 2

The conversion back to a word doc is simple enough for plain text - just reverse the procedure and use word to open the text doc and save it as a word doc.

If the object of the exercise is say to do a search and replace on the text you can do it Word native like this ($word is a word object):

sub word_find_and_replace { my ( $word, $rel_file_path, $tokens_ref ) = @_; # first make a temporary file to do the search and replace on my ( $fh, $temp_name ) = get_tempfile( "$DOC_DIR/system", 'doc' ); close $fh; my $content_ref = read_file( "$DOC_DIR/$rel_file_path" ); create_file( "$DOC_DIR/system/$temp_name", $content_ref, 'overwrit +e ok' ); $word->{visible} = 0; my $doc = $word->{Documents}->Open("$DOC_DIR/system/$temp_name"); my $search_obj = $doc->Content->Find; my $replace_obj = $search_obj->Replacement; for my $token ( keys %$tokens_ref ) { my $find = '<?' . $token . '?>'; my $replace = $tokens_ref->{$token}; # now i know this looks wierd but M$ word (at least 2000) want +s \r # as the para marker not \r\n or even \n if you send \n you ge +t little # binary squares..... oh well that's M$ for you. $replace =~ s/\r\n|\n/\r/g; # this makes it work properly. GO +K $search_obj->{Text} = $find; $replace_obj->{Text} = $replace; $search_obj->Execute({Replace => $wdReplaceAll}); } $doc->Save; $doc->Close; # now get the data out of the modified temp file $content_ref = read_file( "$DOC_DIR/system/$temp_name" ); # remove our unwanted temp files and objects unlink "$DOC_DIR/system/$temp_name"; undef $search_obj; undef $replace_obj; undef $doc; return $content_ref; }

Note if you are doing long search and replaces there is a 255 char buffer overflow that will crash you system or cause wierdness if you are lucky. If you need to insert over 255 chars you need to do a ciper block chaining approach and insert 200 chars plus a token then replace the token with the next 200 chars etc, etc until you have inserted all the text you want to put in.

cheers

tachyon

s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print


In reply to Re: Win32::OLE for MS-Word by tachyon
in thread Win32::OLE for MS-Word by perl_seeker

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.