Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Win32::OLE SaveAs Unicode

by Anonymous Monk
on Jun 19, 2018 at 11:39 UTC ( #1216931=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

I need to use Win32::OLE to extract the text from MsWord. The best solution I could find to maintain a bit of formatting is to use the SaveAs function (I would prefer to read directly into a variable, but I can leave with it). The problem is I can NOT find how to set the parameters to save the file in Unicode (something you get asked by Word after clicking on SaveAs...). I've read all I could, but could not find any substitution/completion of "wdFormatTextLineBreaks" to achieve this goal. On Microsoft specification page, they speak about "wdFormatUnicodeText" with value "7". But I can't find how to specify it in my script (just replacing "wdFormatTextLineBreaks" with "wdFormatUnicodeText" does not produce any effect). Maybe some of you know the answer.

#!/usr/bin/perl use strict; use warnings; use File::Spec::Functions qw( catfile ); use Cwd qw(cwd); use Win32::OLE; use Win32::OLE::Const 'Microsoft Word'; $Win32::OLE::Warn = 3; my $dir = cwd; my $word = get_word(); $word->{Visible} = 0; my $doc = $word->{Documents}->Open(catfile $dir, 'test.docx'); $doc->SaveAs( catfile($dir, 'test.txt'), wdFormatTextLineBreaks ); $doc->Close(0); sub get_word { my $word; eval { $word = Win32::OLE->GetActiveObject('Word.Application'); }; die "$@\n" if $@; unless(defined $word) { $word = Win32::OLE->new('Word.Application', sub { $_[0]->Quit +}) or die "Oops, cannot start Word: ", Win32::OLE->LastError, "\n"; } return $word; } __END__

Replies are listed 'Best First'.
Re: Win32::OLE SaveAs Unicode
by Anonymous Monk on Jun 19, 2018 at 15:10 UTC

    You can do it this way:

    #!/usr/bin/perl use strict; use warnings; use File::Spec::Functions qw( catfile ); use Cwd qw(cwd); use Win32::OLE; use Win32::OLE::Const 'Microsoft.Word'; # wd constants $Win32::OLE::Warn = 3; my $dir = cwd; my $word = get_word(); $word->{Visible} = 0; my $filename_in = catfile $dir, 'test.docx'; my $doc = $word->{Documents}->Open($filename_in); my $filename_out = catfile $dir, 'mytest.txt'; $doc->SaveAs( { Filename => $filename_out, FileFormat => wdFormatUnicodeText | wdFormatTextLineBreaks, } ); $doc->Close(0); sub get_word { my $word; eval { $word = Win32::OLE->GetActiveObject('Word.Application'); }; die "$@\n" if $@; unless ( defined $word ) { $word = Win32::OLE->new( 'Word.Application', sub { $_[0]->Quit + } ) or die "Oops, cannot start Word: ", Win32::OLE->LastError, "\n"; } return $word; }

      Sorry, I was too fast, changing to

      $doc->SaveAs( { Filename => $filename_out, FileFormat => wdFormatText, Encoding => '1200', } );

      does it better.

      A note: I always use Word Macro Recorder and do my processing manually,

      then I 'translate' the VBA code to perl code,

      which is straight forward normally.

        Strange enough, if I save to a temporary file with

        my $filename_out = new File::Temp( UNLINK => 0, SUFFIX => '.txt' );

        I get the following error: Win32:OLE UsedRange error 0x80020005: "Type mismatch"

        It works (with the correction), thank you! I'll have a look at the idea of VBA!

Re: Win32::OLE SaveAs Unicode
by Anonymous Monk on Jun 20, 2018 at 14:27 UTC
    This sounds like a question for Microsoft Corporation.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1216931]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (3)
As of 2022-01-20 16:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    In 2022, my preferred method to securely store passwords is:












    Results (57 votes). Check out past polls.

    Notices?