Denis Mikhailov has asked for the wisdom of the Perl Monks concerning the following question:

Hello.

I am reading a .docx file in Russian. The code $text =~ s/^Испытательная(.*?)/\1/ug; is supposed to process a line by removing part of the text. If the text is written in English, everything is detected and trimmed correctly, but I cannot get it to work with Russian text. In the console, instead of Russian text, I see garbled characters. I've been dancing around encodings — utf8, utf16, cp1251, cp1252 — but none of them give the desired result. Apparently, I am doing something wrong.

Important! When the text is written to a text file from the Word document, it displays correctly.

I would appreciate your help. Thank you.

#!/usr/bin/perl use Win32::OLE; use Win32::OLE::Enum; use utf8; use strict; my $text; my $paragraph; my $document = Win32::OLE -> GetObject($ARGV[1]); open (FH,">$ARGV[0]") || die "Can't open file: $!\n"; my $paragraphs = $document->Paragraphs(); my $enumerate = new Win32::OLE::Enum($paragraphs); while(defined($paragraph = $enumerate->Next())) { $text = $paragraph->{Range}->{Text}; $text =~ s/^Испытат&#107 +7;льная(.*?)/\1/ug; $text =~ s/[\n\r]//g; print FH "$text\n"; }

Replies are listed 'Best First'.
Re: Win32::OLE & encoding
by Corion (Patriarch) on Mar 25, 2026 at 13:56 UTC

    You could try to convince Win32::OLE to convert to/from UTF-8, but I'm a bit unclear on the exact usage.

    use Win32::OLE 'CP_UTF8'; Win32::OLE->Option( CP => CP_UTF8 );

    If that doesn't help, try dumping the exact bytes you get back, and from that maybe you (or we) can divine the actual encoding. Maybe it is UTF16BE or something weird.

    use Devel::Peek; Dump $text;
      Your advice helped! Thank you!
Re: Win32::OLE & encoding
by ikegami (Patriarch) on Mar 25, 2026 at 14:14 UTC

    What's the output of sprintf "%vX", $text for the the Russian text that's expected to match your s///.