To convert a word doc to text all you need to do is use Word to do it (these subs are straight out of a Proforma document management sytem I worte a while back):
Note $wdFormatText is the standard constant which you can get from Win32::OLE::Const but as this plays havoc with warnings I tend to hard code it. It has a value of 2sub save_doc_as_text { my ( $infile, $outfile ) = @_; require Win32::OLE; my $word = Win32::OLE->new( 'Word.Application', sub {$_[0]->Quit;} + ); error( "Can't create new instance or Word Reason:$Win32::OLE::Last +Error" ) unless $word; $word->{visible} = 0; my $doc = $word->{Documents}->Open($infile); error( "Can't open $infile, Reason:$Win32::OLE::LastError" ) unles +s $doc; # wdFormatDocument wdFormatText wdFormatHTML $doc->SaveAs( { FileName => $outfile, FileFormat => $wdFormatText +} ); $doc->Close; undef $doc; undef $word; }
The conversion back to a word doc is simple enough for plain text - just reverse the procedure and use word to open the text doc and save it as a word doc.
If the object of the exercise is say to do a search and replace on the text you can do it Word native like this ($word is a word object):
sub word_find_and_replace { my ( $word, $rel_file_path, $tokens_ref ) = @_; # first make a temporary file to do the search and replace on my ( $fh, $temp_name ) = get_tempfile( "$DOC_DIR/system", 'doc' ); close $fh; my $content_ref = read_file( "$DOC_DIR/$rel_file_path" ); create_file( "$DOC_DIR/system/$temp_name", $content_ref, 'overwrit +e ok' ); $word->{visible} = 0; my $doc = $word->{Documents}->Open("$DOC_DIR/system/$temp_name"); my $search_obj = $doc->Content->Find; my $replace_obj = $search_obj->Replacement; for my $token ( keys %$tokens_ref ) { my $find = '<?' . $token . '?>'; my $replace = $tokens_ref->{$token}; # now i know this looks wierd but M$ word (at least 2000) want +s \r # as the para marker not \r\n or even \n if you send \n you ge +t little # binary squares..... oh well that's M$ for you. $replace =~ s/\r\n|\n/\r/g; # this makes it work properly. GO +K $search_obj->{Text} = $find; $replace_obj->{Text} = $replace; $search_obj->Execute({Replace => $wdReplaceAll}); } $doc->Save; $doc->Close; # now get the data out of the modified temp file $content_ref = read_file( "$DOC_DIR/system/$temp_name" ); # remove our unwanted temp files and objects unlink "$DOC_DIR/system/$temp_name"; undef $search_obj; undef $replace_obj; undef $doc; return $content_ref; }
Note if you are doing long search and replaces there is a 255 char buffer overflow that will crash you system or cause wierdness if you are lucky. If you need to insert over 255 chars you need to do a ciper block chaining approach and insert 200 chars plus a token then replace the token with the next 200 chars etc, etc until you have inserted all the text you want to put in.
cheers
tachyon
s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print
In reply to Re: Win32::OLE for MS-Word
by tachyon
in thread Win32::OLE for MS-Word
by perl_seeker
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |