comment on

Hello all,

I present with a transformed version of my html template, where the scope is limited to russian crosswords. A lot of things just don't seem to want to work the same when you use cyrillic characters, and I hope that the lessons I seek will generalize for others' unicode projects. Let me start with a listing of the main script in readmore tags, and then I'll pull out the parts that need grease.

$ cat 7.cw1.pl
#!/usr/bin/perl -w
use 5.011;
use lib "template_stuff";
use html7;
use trans2;    ##yandex option available
use Path::Tiny;
use utils1;
use utf8;
use Encode;
use open OUT => ':encoding(UTF-8)', ':std';
use Net::SFTP::Foreign;
use Data::Dumper;

# initializations that must precede main data structure

my $ts          = "template_stuff";
my $images      = "aimages";
my $captions    = "captions";
my $ruscaptions = "ruscaptions";

## turning things to Path::Tiny
# decode paths

my $abs   = path(__FILE__)->absolute;
my $path1 = Path::Tiny->cwd;
my $title = $path1->basename;
$abs   = decode( 'UTF-8', $abs );
$path1 = decode( 'UTF-8', $path1 );
$title = decode( 'UTF-8', $title );
say "title is $title";
say "path1 is $path1";
say "abs is $abs";
my $path2 = path( $path1, $ts );

# page params
my %vars = (
  title        => $title,
  headline     => undef,
  place        => 'Vancouver',
  base_url     => 'http://www.merrillpjensen.com',
  css_file     => "${title}1.css",
  header       => path( $path2, "hc_input2.txt" ),
  footer       => path( $path2, "footer_center3.txt" ),
  body         => path( $path2, "rebus5.tmpl" ),
  print_script => "1",
  code_tmpl    => path( $path2, "code2.tmpl" ),
  oitop        => path( $path2, "oitop.txt" ),
  oibottom     => path( $path2, "oibottom.txt" ),
  to_images    => path( $path2, $images ),
  eng_captions => path( $path2, $captions ),
  rus_captions => path( $path2, $ruscaptions ),
  translations => path( $path2, 'translations' ),
  bottom       => path( $path2, "bottom1.txt" ),
  book         => 'Crosswords: ',
  chapter      => '&#1050;&#1088;&#1086;&#1089;&#1089;&#1074;&#1086;&#
+1088;&#1076;&#1099;',
  make_puzzle  => 1,
  print_module => 0,
  script_file  => $abs,
  module_tmpl  => path( $path2, "code3.tmpl" ),
  server_dir   => 'perlmonks',
  image_dir    => 'pmimage',
  ts           => 'template_system',
  css_path     => $path2,
  ini_path     => path('/home/bob/Documents/html_template_data/3.&#109
+4;&#1077;&#1085;&#1085;&#1086;&#1089;&#1090;&#1080;.ini'),
  cw           => path($path2,'crosswords'),
);

my $rvars = \%vars;

my $word = 'cw';

foreach my $child ( $vars{cw}->children ) {
  next unless $child->is_dir;

  say "dyetya is $child";
  my $base_dir = $child->basename;
  say "base dir is $base_dir";
  $vars{$base_dir} = path(  $child );
  say "dir is $vars{$base_dir}";
  

}


my $sftp = get_&#1090;&#1072;&#1081;&#1085;&#1099;&#1081;($rvars);
say "result is $sftp";
  my $dir2 = $vars{"server_dir"};
  say "dir2 is $dir2";
  my $ls = $sftp->ls( "/$dir2", wanted => qr/$word/ )
    or warn "unable to retrieve " . $sftp->error;
  #print "$_->{filename}\n" for (@$ls);

  my @remote_files = map { $_->{filename} } @$ls;
  #say "files are @remote_files";
  my $rref     = \@remote_files;
#say Dumper $rref;

say "ultimate disposition of main hash-------";
say Dumper $rvars;
__END__ 

$
[download]

One reason I'm hiding this under a readmore tag is that perltidy doesn't want to format it. Right now, this is what I use for all my perltidy commands:

$ cat .bash_aliases 
alias pt='perltidy -i=2 -b '
[download]

$ pt 7.cw1.pl
## Please see file 7.cw1.pl.ERR
$ cat 7.cw1.pl.ERR
Perltidy version is 20180220
85:    unexpected character decimal 209 (&#65533;) in script
85:    unexpected character decimal 130 (&#65533;) in script
...
85:    unexpected character decimal 185 (&#65533;) in script
85:    Giving up after error
$
[download]

Q1) Can I alter my perltidy command so that these chars are not problematic?

My second issue is when I try to loop over cyrillic directories. The relevant source is:

foreach my $child ( $vars{cw}->children ) {
  next unless $child->is_dir;

  say "dyetya is $child";
  my $base_dir = $child->basename;
  say "base dir is $base_dir";
  $vars{$base_dir} = path(  $child );
  say "dir is $vars{$base_dir}";
  

}
[download]

The paths are built for the ascii path as well and loaded to the main data structure as a blessed entity:

dyetya is /home/bob/2.scripts/pages/7.cw/template_stuff/crosswords/eug
+ene
base dir is eugene
[download]

,and there's nothing from cyrillic paths. Bash shows them here with pre tags:

$ pwd
/home/bob/2.scripts/pages/7.cw/template_stuff/crosswords
$ ls
caption_filled.gif  eugene  захват  изображение  подписи
$ ls -l
total 24
-rw-r--r-- 1 bob bob 7882 Jan  7 22:58 caption_filled.gif
drwxr-xr-x 3 bob bob 4096 Jan 31 16:41 eugene
drwxr-xr-x 2 bob bob 4096 Jan  7 22:28 захват
drwxr-xr-x 2 bob bob 4096 Jan  7 22:45 изображение
drwxr-xr-x 2 bob bob 4096 Feb  1 13:04 подписи
$

Q2) How do I convince Path::Tiny to give me all these paths?

My third question regards what I'm trying to build a a path to:

$ pwd
/home/bob/2.scripts/pages/7.cw/template_stuff/crosswords/подписи
$ cat a.txt
л лесопарк
о и о о е 
комар р й 
л б трасса
ё  к   т б
пасс   ухо
  менеджер
фауна раки
  тоска  г
шкант м  е
    атаман
    в т т 
    н ухаб
    игроки
    к грач

$

I think this syntax from Path::Tiny could work:

@lines = $file->lines_utf8;

What I seek to do here is simply read it in, make sure that whitespace is removed and and have it be echoed out in its precise locations. What I have now addresses the encoding issue, but not the spacing:

sub print_aoa_utf8 {
  use warnings;
  use 5.011;
  use utf8;    # a la François
  use open OUT => ':encoding(utf8)';
  use open ':std';

  my $a   = shift;
  my @AoA = @$a;

  for my $i ( 0 .. $#AoA ) {
    my $aref = $AoA[$i];
    for my $j ( 0 .. $#{$aref} ) {
      print "elt $i $j is $AoA[$i][$j]\n";
    }
  }
  return $a;
}
[download]

One final question. I wrote my newest version of my "get_tiny" sftp object creator as a homonym:

my $sftp = get_тайный($rvars);

, because I thought it would be a little harder to crack if it were done in a non-ascii way. (Of course, I may have dished myself to those who do.) Then I watch a NOVA that said that with the advent of quantum computing, public key encryption would be passé. I failed to understand how this is going to happen. Should we all just give up?

Thanks for your comment and cheers,

In reply to help with cyrillic characters in odd places by Aldebaran

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.