in reply to Re: dealing with cyrillic characters
in thread dealing with cyrillic characters
Wow, thanks, it was that simple a fix. I got everything I wanted by setting the binmode to utf8 on File::Slurp. I looked on gedit to see what encoding the underlying text files might have and was unable to ascertain it. That I can read the cyrillic makes me think it is indeed utf8. Relevant code:
sub get_rus_text { use 5.010; use File::Basename; use Cwd; use HTML::FromText; use File::Slurp; use Path::Class; my $rvars = shift; my %vars = %$rvars; my %content; my $refc = \%content; opendir my $eh, $vars{"rus_captions"} or die "dead $!\n"; while (defined ($_ = readdir($eh))){ next if m/~$/; next if -d; ### revision for better russian use 7/18 # set binmode for File::Slurp # run cyrillic through HTML::FromText if (m/txt$/){ my $file = file($vars{"rus_captions"},$_); my $string = read_file($file, binmode => ':utf8' ); #say "string is $string"; my $temp = text2html( $string, urls => 1, email => 1, paras => 1, ); # surround by divs my $oitop = read_file($vars{"oitop"}); my $oibottom = read_file($vars{"oibottom"}); my $text = $oitop.$temp.$oibottom; #say "text is $text"; $content{$_} = $text; } } closedir $eh; #important to sort my @return; foreach my $key (sort keys %content) { print $content{$key} . "\n"; push @return, $content{$key}; } return \@return; }
improved page I budgeted all day to figure this out, so I'm gonna go form some concrete. большое спасибо снова.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: dealing with cyrillic characters
by haukex (Archbishop) on Jun 23, 2018 at 08:50 UTC |