Aldebaran has asked for the wisdom of the Perl Monks concerning the following question:
I'm having some issues with rendering the russian captions on my personal website, where I use a perl templating system to populate the content and get everything loaded to the web. Its nominal form is an image, followed by english captions and then russian captions. What I had was working alright, if you want to mark yourself as a hobby coder. The russian captions aren't fitting properly within their html boundary, as they are not getting treated the way english ones do, like this: testimonial text Furthermore, the russian ones don't render as paragraphs.
The most relevant sections of code that did this are here, within readmore tags. One can contrast how this worked for english versus russian. In the english ones, I put them through the text2html function of HTML::FromText, which preserves urls, e-mails, and paragraphs. In this version, I don't make such a call in the russian caption-reading function. Please do not read if code makes you grumpy. I did most of this coding as a was studying references in perl as an intermediate. I wouldn't say that I've progressed any in the meantime. Any suggestions to improve code are gladly accepted.
sub get_eng_text { use strict; use 5.010; use File::Basename; use Cwd; use HTML::FromText; use File::Slurp; use Path::Class; my $rvars = shift; my %vars = %$rvars; my %content; my $refc = \%content; opendir my $eh, $vars{"eng_captions"} or die "dead $!\n"; while (defined ($_ = readdir($eh))){ next if m/~$/; next if -d; if (m/txt$/){ my $file = file($vars{"eng_captions"},$_); my $string = read_file($file); my $temp = text2html( $string, urls => 1, email => 1, paras => 1, ); # surround by divs my $oitop = read_file($vars{"oitop"}); my $oibottom = read_file($vars{"oibottom"}); my $text = $oitop.$temp.$oibottom; say "default is $_"; $content{$_} = $text; } } closedir $eh; #important to sort my @return; foreach my $key (sort keys %content) { push @return, $content{$key}; } #say "return is @return"; return \@return; } sub get_rus_text { use 5.010; use File::Basename; use Cwd; use HTML::FromText; use File::Slurp; use Path::Class; my $rvars = shift; my %vars = %$rvars; my %content; my $refc = \%content; opendir my $eh, $vars{"rus_captions"} or die "dead $!\n"; while (defined ($_ = readdir($eh))){ next if m/~$/; next if -d; if (m/txt$/){ my $file = file($vars{"rus_captions"},$_); my $string = read_file($file); # surround by divs my $oitop = read_file($vars{"oitop"}); my $oibottom = read_file($vars{"oibottom"}); my $text = $oitop.$string.$oibottom; $content{$_} = $text; } } closedir $eh; #important to sort my @return; foreach my $key (sort keys %content) { print $content{$key} . "\n"; push @return, $content{$key}; } return \@return; } sub write_body{ use strict; use warnings; use 5.010; use Text::Template; use Encode; my $rvars = shift; my $reftoAoA = shift; my %vars = %$rvars; my @AoA = @$reftoAoA; my $body = $vars{"body"}; my $template = Text::Template->new( ENCODING => 'utf8', SOURCE => $body) or die "Couldn't construct template: $!"; my $return; for my $i ( 0 .. $#AoA ){ $vars{"file"} = $AoA[$i][0]; $vars{"english"} = $AoA[$i][1]; my $ustring = $AoA[$i][2]; $ustring = decode_utf8( $ustring ); $vars{"russian"} = $ustring; my $result = $template->fill_in(HASH => \%vars); $return = $return.$result; } return \$return; }
So, future friar me says, "run the russian text through text2html, and see what you get." With a little more russian text added to the headline to show how it doesn't render and the print_script function enabled, This html page shows how the russian goes when it goes wonky for me. It's always a matter of seeing these characters show like this: мой оп‹‚, сила и надежда The same characters show up when I try to use a hex editor such as okteta to manipulate these texts. I don't seem to get any meaningful conversion to happen, and I'm left with a sea of these deformed D-creatures. Here is the code for this latest attempt:
sub get_rus_text { use 5.010; use File::Basename; use Cwd; use HTML::FromText; use File::Slurp; use Path::Class; my $rvars = shift; my %vars = %$rvars; my %content; my $refc = \%content; opendir my $eh, $vars{"rus_captions"} or die "dead $!\n"; while (defined ($_ = readdir($eh))){ next if m/~$/; next if -d; if (m/txt$/){ my $file = file($vars{"rus_captions"},$_); my $string = read_file($file); ### revision for better russian use 7/18 my $temp = text2html( $string, urls => 1, email => 1, paras => 1, ); # surround by divs my $oitop = read_file($vars{"oitop"}); my $oibottom = read_file($vars{"oibottom"}); my $text = $oitop.$temp.$oibottom; $content{$_} = $text; } } closedir $eh; #important to sort my @return; foreach my $key (sort keys %content) { print $content{$key} . "\n"; push @return, $content{$key}; } return \@return; }
My question is how do I get the formatting for the russian characters without having them turn into the D-creatures? What must be happening every time I see an encoding that makes no sense as in the headline or in okteta when I can readily read the cyrillic source text?
Thanks for your comment.
2018-06-22 Athanasius moved readmore tags outside of code tags
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: dealing with cyrillic characters
by IB2017 (Pilgrim) on Jun 22, 2018 at 02:04 UTC | |
by Aldebaran (Curate) on Jun 22, 2018 at 17:43 UTC | |
by haukex (Archbishop) on Jun 23, 2018 at 08:50 UTC | |
|
Re: dealing with cyrillic characters (perlunitut)
by Anonymous Monk on Jun 22, 2018 at 02:39 UTC | |
by Aldebaran (Curate) on Jun 22, 2018 at 22:34 UTC |