Aldebaran has asked for the wisdom of the Perl Monks concerning the following question:
Hello all,
I present with a transformed version of my html template, where the scope is limited to russian crosswords. A lot of things just don't seem to want to work the same when you use cyrillic characters, and I hope that the lessons I seek will generalize for others' unicode projects. Let me start with a listing of the main script in readmore tags, and then I'll pull out the parts that need grease.
$ cat 7.cw1.pl #!/usr/bin/perl -w use 5.011; use lib "template_stuff"; use html7; use trans2; ##yandex option available use Path::Tiny; use utils1; use utf8; use Encode; use open OUT => ':encoding(UTF-8)', ':std'; use Net::SFTP::Foreign; use Data::Dumper; # initializations that must precede main data structure my $ts = "template_stuff"; my $images = "aimages"; my $captions = "captions"; my $ruscaptions = "ruscaptions"; ## turning things to Path::Tiny # decode paths my $abs = path(__FILE__)->absolute; my $path1 = Path::Tiny->cwd; my $title = $path1->basename; $abs = decode( 'UTF-8', $abs ); $path1 = decode( 'UTF-8', $path1 ); $title = decode( 'UTF-8', $title ); say "title is $title"; say "path1 is $path1"; say "abs is $abs"; my $path2 = path( $path1, $ts ); # page params my %vars = ( title => $title, headline => undef, place => 'Vancouver', base_url => 'http://www.merrillpjensen.com', css_file => "${title}1.css", header => path( $path2, "hc_input2.txt" ), footer => path( $path2, "footer_center3.txt" ), body => path( $path2, "rebus5.tmpl" ), print_script => "1", code_tmpl => path( $path2, "code2.tmpl" ), oitop => path( $path2, "oitop.txt" ), oibottom => path( $path2, "oibottom.txt" ), to_images => path( $path2, $images ), eng_captions => path( $path2, $captions ), rus_captions => path( $path2, $ruscaptions ), translations => path( $path2, 'translations' ), bottom => path( $path2, "bottom1.txt" ), book => 'Crosswords: ', chapter => 'Кроссво&# +1088;ды', make_puzzle => 1, print_module => 0, script_file => $abs, module_tmpl => path( $path2, "code3.tmpl" ), server_dir => 'perlmonks', image_dir => 'pmimage', ts => 'template_system', css_path => $path2, ini_path => path('/home/bob/Documents/html_template_data/3.m +4;енности.ini'), cw => path($path2,'crosswords'), ); my $rvars = \%vars; my $word = 'cw'; foreach my $child ( $vars{cw}->children ) { next unless $child->is_dir; say "dyetya is $child"; my $base_dir = $child->basename; say "base dir is $base_dir"; $vars{$base_dir} = path( $child ); say "dir is $vars{$base_dir}"; } my $sftp = get_тайный($rvars); say "result is $sftp"; my $dir2 = $vars{"server_dir"}; say "dir2 is $dir2"; my $ls = $sftp->ls( "/$dir2", wanted => qr/$word/ ) or warn "unable to retrieve " . $sftp->error; #print "$_->{filename}\n" for (@$ls); my @remote_files = map { $_->{filename} } @$ls; #say "files are @remote_files"; my $rref = \@remote_files; #say Dumper $rref; say "ultimate disposition of main hash-------"; say Dumper $rvars; __END__ $
One reason I'm hiding this under a readmore tag is that perltidy doesn't want to format it. Right now, this is what I use for all my perltidy commands:
$ cat .bash_aliases alias pt='perltidy -i=2 -b '
$ pt 7.cw1.pl ## Please see file 7.cw1.pl.ERR $ cat 7.cw1.pl.ERR Perltidy version is 20180220 85: unexpected character decimal 209 (�) in script 85: unexpected character decimal 130 (�) in script ... 85: unexpected character decimal 185 (�) in script 85: Giving up after error $
Q1) Can I alter my perltidy command so that these chars are not problematic?
My second issue is when I try to loop over cyrillic directories. The relevant source is:
foreach my $child ( $vars{cw}->children ) { next unless $child->is_dir; say "dyetya is $child"; my $base_dir = $child->basename; say "base dir is $base_dir"; $vars{$base_dir} = path( $child ); say "dir is $vars{$base_dir}"; }
The paths are built for the ascii path as well and loaded to the main data structure as a blessed entity:
dyetya is /home/bob/2.scripts/pages/7.cw/template_stuff/crosswords/eug +ene base dir is eugene
,and there's nothing from cyrillic paths. Bash shows them here with pre tags:
$ pwd /home/bob/2.scripts/pages/7.cw/template_stuff/crosswords $ ls caption_filled.gif eugene захват изображение подписи $ ls -l total 24 -rw-r--r-- 1 bob bob 7882 Jan 7 22:58 caption_filled.gif drwxr-xr-x 3 bob bob 4096 Jan 31 16:41 eugene drwxr-xr-x 2 bob bob 4096 Jan 7 22:28 захват drwxr-xr-x 2 bob bob 4096 Jan 7 22:45 изображение drwxr-xr-x 2 bob bob 4096 Feb 1 13:04 подписи $
Q2) How do I convince Path::Tiny to give me all these paths?
My third question regards what I'm trying to build a a path to:
$ pwd
/home/bob/2.scripts/pages/7.cw/template_stuff/crosswords/подписи
$ cat a.txt
л лесопарк
о и о о е
комар р й
л б трасса
ё к т б
пасс ухо
менеджер
фауна раки
тоска г
шкант м е
атаман
в т т
н ухаб
игроки
к грач
$
I think this syntax from Path::Tiny could work:
@lines = $file->lines_utf8;What I seek to do here is simply read it in, make sure that whitespace is removed and and have it be echoed out in its precise locations. What I have now addresses the encoding issue, but not the spacing:
sub print_aoa_utf8 { use warnings; use 5.011; use utf8; # a la Franois use open OUT => ':encoding(utf8)'; use open ':std'; my $a = shift; my @AoA = @$a; for my $i ( 0 .. $#AoA ) { my $aref = $AoA[$i]; for my $j ( 0 .. $#{$aref} ) { print "elt $i $j is $AoA[$i][$j]\n"; } } return $a; }
One final question. I wrote my newest version of my "get_tiny" sftp object creator as a homonym:
my $sftp = get_тайный($rvars);, because I thought it would be a little harder to crack if it were done in a non-ascii way. (Of course, I may have dished myself to those who do.) Then I watch a NOVA that said that with the advent of quantum computing, public key encryption would be pass. I failed to understand how this is going to happen. Should we all just give up?
Thanks for your comment and cheers,
|
|---|