Denis Mikhailov has asked for the wisdom of the Perl Monks concerning the following question:
Hello.
I am reading a .docx file in Russian. The code $text =~ s/^Испытательная(.*?)/\1/ug; is supposed to process a line by removing part of the text. If the text is written in English, everything is detected and trimmed correctly, but I cannot get it to work with Russian text. In the console, instead of Russian text, I see garbled characters. I've been dancing around encodings — utf8, utf16, cp1251, cp1252 — but none of them give the desired result. Apparently, I am doing something wrong.
Important! When the text is written to a text file from the Word document, it displays correctly.
I would appreciate your help. Thank you.
#!/usr/bin/perl use Win32::OLE; use Win32::OLE::Enum; use utf8; use strict; my $text; my $paragraph; my $document = Win32::OLE -> GetObject($ARGV[1]); open (FH,">$ARGV[0]") || die "Can't open file: $!\n"; my $paragraphs = $document->Paragraphs(); my $enumerate = new Win32::OLE::Enum($paragraphs); while(defined($paragraph = $enumerate->Next())) { $text = $paragraph->{Range}->{Text}; $text =~ s/^Испытатk +7;льная(.*?)/\1/ug; $text =~ s/[\n\r]//g; print FH "$text\n"; }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Win32::OLE & encoding
by Corion (Patriarch) on Mar 25, 2026 at 13:56 UTC | |
by Denis Mikhailov (Novice) on Mar 25, 2026 at 17:47 UTC | |
|
Re: Win32::OLE & encoding
by ikegami (Patriarch) on Mar 25, 2026 at 14:14 UTC |