karavay has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks - having a little difficulty preserving unicode characters after SaveAs() with the below code.

open file -> read Unicode text (e.g. russian) -> save as text or html with Unicode preserved.

I’ve tried playing with: FileFormat => wdFormatUnicodeText but no luck
Platform : win32 (XP)
Word 2007

if ( $file =~ /\.doc$/i ) { my $filename = $dir . "/docs/" . $file; $filename =~ s'/'\\' ; # invert slashes otherwise SaveAs cannot proce +ss the path correctly!!! my $savename = $dir . "/txt/" . $file . ".htm"; $savename =~ s/.doc//; print "Starting word\n"; my $Word = Win32::OLE->new( 'Word.Application', 'Q +uit' ); $Word->{Visible} = 0; my ($doc) = $Word->Documents->Open($filename) or die( "Unable to open document ", Win32::OLE->LastError() ); $doc->SaveAs( { FileName => $savename, FileFormat => wdFormatDOSTextLineBreaks } ); #FileFormat => wdFormatUnicodeText });#unicode s +upport print "Closing document and Word\n"; $Word->ActiveDocument->Close(); $Word->Quit; $b++; }


any suggestions? Thanks,

Replies are listed 'Best First'.
Re: win32::ole - SaveAs( )
by vkon (Curate) on Oct 01, 2007 at 16:54 UTC
    your $savename isn't treated as unicode, actually.

    Try Encode::decode("cp1251",$savename);
    If this fails, you can enforce for it to be treated as Unicode using OLE Variant, see perldoc Win32::OLE::Variant

    addition FileFormat => wdFormatUnicodeText actually about file format (rtf etc) and not about file name.

      sry vkon maybe i got u wrong but, it is about the contents of the file and not about its name.

      the content of the doc file should be read and saved without losing its Unicode.

      Thanks for your reply,
        ... then you should use either wdFormatEncodedText or wdFormatUnicodeText but you must use strict; first, which will reveal your error in its usage.

        you use Word's constants incorrectly actually wdFormatUnicodeText has some small digit value (do not have it handy)

        Use Win32::OLE::Const->Load(....); to reveal OLE constant values.