Hello.
I am reading a .docx file in Russian. The code $text =~ s/^Испытательная(.*?)/\1/ug; is supposed to process a line by removing part of the text. If the text is written in English, everything is detected and trimmed correctly, but I cannot get it to work with Russian text. In the console, instead of Russian text, I see garbled characters. I've been dancing around encodings — utf8, utf16, cp1251, cp1252 — but none of them give the desired result. Apparently, I am doing something wrong.
Important! When the text is written to a text file from the Word document, it displays correctly.
I would appreciate your help. Thank you.
#!/usr/bin/perl use Win32::OLE; use Win32::OLE::Enum; use utf8; use strict; my $text; my $paragraph; my $document = Win32::OLE -> GetObject($ARGV[1]); open (FH,">$ARGV[0]") || die "Can't open file: $!\n"; my $paragraphs = $document->Paragraphs(); my $enumerate = new Win32::OLE::Enum($paragraphs); while(defined($paragraph = $enumerate->Next())) { $text = $paragraph->{Range}->{Text}; $text =~ s/^Испытатk +7;льная(.*?)/\1/ug; $text =~ s/[\n\r]//g; print FH "$text\n"; }
In reply to Win32::OLE & encoding by Denis Mikhailov
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |