Hello all,
I present with a transformed version of my html template, where the scope is limited to russian crosswords. A lot of things just don't seem to want to work the same when you use cyrillic characters, and I hope that the lessons I seek will generalize for others' unicode projects. Let me start with a listing of the main script in readmore tags, and then I'll pull out the parts that need grease.
$ cat 7.cw1.pl #!/usr/bin/perl -w use 5.011; use lib "template_stuff"; use html7; use trans2; ##yandex option available use Path::Tiny; use utils1; use utf8; use Encode; use open OUT => ':encoding(UTF-8)', ':std'; use Net::SFTP::Foreign; use Data::Dumper; # initializations that must precede main data structure my $ts = "template_stuff"; my $images = "aimages"; my $captions = "captions"; my $ruscaptions = "ruscaptions"; ## turning things to Path::Tiny # decode paths my $abs = path(__FILE__)->absolute; my $path1 = Path::Tiny->cwd; my $title = $path1->basename; $abs = decode( 'UTF-8', $abs ); $path1 = decode( 'UTF-8', $path1 ); $title = decode( 'UTF-8', $title ); say "title is $title"; say "path1 is $path1"; say "abs is $abs"; my $path2 = path( $path1, $ts ); # page params my %vars = ( title => $title, headline => undef, place => 'Vancouver', base_url => 'http://www.merrillpjensen.com', css_file => "${title}1.css", header => path( $path2, "hc_input2.txt" ), footer => path( $path2, "footer_center3.txt" ), body => path( $path2, "rebus5.tmpl" ), print_script => "1", code_tmpl => path( $path2, "code2.tmpl" ), oitop => path( $path2, "oitop.txt" ), oibottom => path( $path2, "oibottom.txt" ), to_images => path( $path2, $images ), eng_captions => path( $path2, $captions ), rus_captions => path( $path2, $ruscaptions ), translations => path( $path2, 'translations' ), bottom => path( $path2, "bottom1.txt" ), book => 'Crosswords: ', chapter => 'Кроссво&# +1088;ды', make_puzzle => 1, print_module => 0, script_file => $abs, module_tmpl => path( $path2, "code3.tmpl" ), server_dir => 'perlmonks', image_dir => 'pmimage', ts => 'template_system', css_path => $path2, ini_path => path('/home/bob/Documents/html_template_data/3.m +4;енности.ini'), cw => path($path2,'crosswords'), ); my $rvars = \%vars; my $word = 'cw'; foreach my $child ( $vars{cw}->children ) { next unless $child->is_dir; say "dyetya is $child"; my $base_dir = $child->basename; say "base dir is $base_dir"; $vars{$base_dir} = path( $child ); say "dir is $vars{$base_dir}"; } my $sftp = get_тайный($rvars); say "result is $sftp"; my $dir2 = $vars{"server_dir"}; say "dir2 is $dir2"; my $ls = $sftp->ls( "/$dir2", wanted => qr/$word/ ) or warn "unable to retrieve " . $sftp->error; #print "$_->{filename}\n" for (@$ls); my @remote_files = map { $_->{filename} } @$ls; #say "files are @remote_files"; my $rref = \@remote_files; #say Dumper $rref; say "ultimate disposition of main hash-------"; say Dumper $rvars; __END__ $
One reason I'm hiding this under a readmore tag is that perltidy doesn't want to format it. Right now, this is what I use for all my perltidy commands:
$ cat .bash_aliases alias pt='perltidy -i=2 -b '
$ pt 7.cw1.pl ## Please see file 7.cw1.pl.ERR $ cat 7.cw1.pl.ERR Perltidy version is 20180220 85: unexpected character decimal 209 (�) in script 85: unexpected character decimal 130 (�) in script ... 85: unexpected character decimal 185 (�) in script 85: Giving up after error $
Q1) Can I alter my perltidy command so that these chars are not problematic?
My second issue is when I try to loop over cyrillic directories. The relevant source is:
foreach my $child ( $vars{cw}->children ) { next unless $child->is_dir; say "dyetya is $child"; my $base_dir = $child->basename; say "base dir is $base_dir"; $vars{$base_dir} = path( $child ); say "dir is $vars{$base_dir}"; }
The paths are built for the ascii path as well and loaded to the main data structure as a blessed entity:
dyetya is /home/bob/2.scripts/pages/7.cw/template_stuff/crosswords/eug +ene base dir is eugene
,and there's nothing from cyrillic paths. Bash shows them here with pre tags:
$ pwd /home/bob/2.scripts/pages/7.cw/template_stuff/crosswords $ ls caption_filled.gif eugene захват изображение подписи $ ls -l total 24 -rw-r--r-- 1 bob bob 7882 Jan 7 22:58 caption_filled.gif drwxr-xr-x 3 bob bob 4096 Jan 31 16:41 eugene drwxr-xr-x 2 bob bob 4096 Jan 7 22:28 захват drwxr-xr-x 2 bob bob 4096 Jan 7 22:45 изображение drwxr-xr-x 2 bob bob 4096 Feb 1 13:04 подписи $
Q2) How do I convince Path::Tiny to give me all these paths?
My third question regards what I'm trying to build a a path to:
$ pwd
/home/bob/2.scripts/pages/7.cw/template_stuff/crosswords/подписи
$ cat a.txt
л лесопарк
о и о о е
комар р й
л б трасса
ё к т б
пасс ухо
менеджер
фауна раки
тоска г
шкант м е
атаман
в т т
н ухаб
игроки
к грач
$
I think this syntax from Path::Tiny could work:
@lines = $file->lines_utf8;What I seek to do here is simply read it in, make sure that whitespace is removed and and have it be echoed out in its precise locations. What I have now addresses the encoding issue, but not the spacing:
sub print_aoa_utf8 { use warnings; use 5.011; use utf8; # a la François use open OUT => ':encoding(utf8)'; use open ':std'; my $a = shift; my @AoA = @$a; for my $i ( 0 .. $#AoA ) { my $aref = $AoA[$i]; for my $j ( 0 .. $#{$aref} ) { print "elt $i $j is $AoA[$i][$j]\n"; } } return $a; }
One final question. I wrote my newest version of my "get_tiny" sftp object creator as a homonym:
my $sftp = get_тайный($rvars);, because I thought it would be a little harder to crack if it were done in a non-ascii way. (Of course, I may have dished myself to those who do.) Then I watch a NOVA that said that with the advent of quantum computing, public key encryption would be passé. I failed to understand how this is going to happen. Should we all just give up?
Thanks for your comment and cheers,
In reply to help with cyrillic characters in odd places by Aldebaran
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |