Thanks for taking this question farther. The list of non-fixes for File::Slurp made me willing to try Path::Tiny. Where it ended up is having the routines that get english and russian captions completely analogous to each other:

sub get_eng_text { use 5.010; use HTML::FromText; use File::Slurp; use Path::Tiny; use utf8; ### revision for better utf8 encodings 7/18 # using Path::Tiny instead of deprecated File::Slurp # now analagous to get_rus_text my $rvars = shift; my %vars = %$rvars; my %content; my $refc = \%content; opendir my $eh, $vars{"eng_captions"} or warn "no eng captions $!\n"; while (defined ($_ = readdir($eh))){ next if m/~$/; next if -d; if (m/txt$/){ my $file = path($vars{"eng_captions"},$_); my $guts = $file->slurp_utf8; my $temp = text2html( $guts, urls => 1, email => 1, paras => 1, ); # surround by divs my $oitop = read_file($vars{"oitop"}); my $oibottom = read_file($vars{"oibottom"}); my $text = $oitop.$temp.$oibottom; #say "text is $text"; $content{$_} = $text; } } closedir $eh; #important to sort my @return; foreach my $key (sort keys %content) { print $content{$key} . "\n"; push @return, $content{$key}; } return \@return; } sub get_rus_text { use 5.010; use HTML::FromText; use File::Slurp; use Path::Tiny; use utf8; ### revision for better russian use 7/18 # run cyrillic through HTML::FromText # using Path::Tiny instead of deprecated File::Slurp # use utf8 allows use of cyrillic from within this coding unit my $rvars = shift; my %vars = %$rvars; my %content; my $refc = \%content; opendir my $eh, $vars{"rus_captions"} or warn "no rus captions $!\n"; while (defined ($_ = readdir($eh))){ next if m/~$/; next if -d; if (m/txt$/){ my $file = path($vars{"rus_captions"},$_); my $guts = $file->slurp_utf8; my $temp = text2html( $guts, urls => 1, email => 1, paras => 1, ); # surround by divs my $oitop = read_file($vars{"oitop"}); my $oibottom = read_file($vars{"oibottom"}); my $text = $oitop.$temp.$oibottom; #say "text is $text"; $content{$_} = $text; } } closedir $eh; #important to sort my @return; foreach my $key (sort keys %content) { print $content{$key} . "\n"; push @return, $content{$key}; } return \@return; }

I tried to combine these two into one function and call it slightly differently, but I didn't succeed on the first try. Sometimes I just have to go with what I've got and call the template good enough for now.


In reply to Re^2: dealing with cyrillic characters (perlunitut) by Aldebaran
in thread dealing with cyrillic characters by Aldebaran

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.