http://qs1969.pair.com?node_id=1216931

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

I need to use Win32::OLE to extract the text from MsWord. The best solution I could find to maintain a bit of formatting is to use the SaveAs function (I would prefer to read directly into a variable, but I can leave with it). The problem is I can NOT find how to set the parameters to save the file in Unicode (something you get asked by Word after clicking on SaveAs...). I've read all I could, but could not find any substitution/completion of "wdFormatTextLineBreaks" to achieve this goal. On Microsoft specification page, they speak about "wdFormatUnicodeText" with value "7". But I can't find how to specify it in my script (just replacing "wdFormatTextLineBreaks" with "wdFormatUnicodeText" does not produce any effect). Maybe some of you know the answer.

#!/usr/bin/perl use strict; use warnings; use File::Spec::Functions qw( catfile ); use Cwd qw(cwd); use Win32::OLE; use Win32::OLE::Const 'Microsoft Word'; $Win32::OLE::Warn = 3; my $dir = cwd; my $word = get_word(); $word->{Visible} = 0; my $doc = $word->{Documents}->Open(catfile $dir, 'test.docx'); $doc->SaveAs( catfile($dir, 'test.txt'), wdFormatTextLineBreaks ); $doc->Close(0); sub get_word { my $word; eval { $word = Win32::OLE->GetActiveObject('Word.Application'); }; die "$@\n" if $@; unless(defined $word) { $word = Win32::OLE->new('Word.Application', sub { $_[0]->Quit +}) or die "Oops, cannot start Word: ", Win32::OLE->LastError, "\n"; } return $word; } __END__