in reply to Re: create clone script for utf8 encoding
in thread create clone script for utf8 encoding
Am I correct that what my OS is telling me is its best guess as to how to interpret this file and have it make any sense?
$ file -i *.pl 18.clone.pl: text/x-perl; charset=us-ascii 1.a.pl: text/x-perl; charset=utf-8 1.haukex.pl: text/x-perl; charset=us-ascii 1.k.pl: text/x-perl; charset=us-ascii 2.haukex.pl: text/x-perl; charset=us-ascii 3.haukex.pl: text/x-perl; charset=utf-8 3.ping3a.pl: text/x-perl; charset=us-ascii 4.haukex.pl: text/x-perl; charset=utf-8 4.ping3a.pl: text/x-perl; charset=us-ascii 5.ping3a.pl: text/x-perl; charset=us-ascii $
What seems to be very much the case is that the OS thinks the doc is utf8 if there are utf8 non-ascii characters in it. I did nothing with the #.haukex scripts to change from us-ascii to utf8 but begin to include cyrillic characters, like so with pre tags:
$ ./1.a.pl 3.haukex.pl
argv is 3.haukex.pl
before decode is 3.haukex.pl
after decode is 3.haukex.pl
current is /home/bob/2.scripts/pages/1.cw/template_stuff/translations/rus.cw
-------------
in_file: 3.haukex.pl
new base is 4.haukex.pl
save path is /home/bob/2.scripts/pages/1.cw/template_stuff/translations/rus.cw/4.haukex.pl
return is /home/bob/2.scripts/pages/1.cw/template_stuff/translations/rus.cw/4.haukex.pl
2.haukex.pl
3.haukex.pl
4.haukex.pl#!/usr/bin/perl -w
use 5.011;
use Carp;
use Data::Alias 'alias';
use Data::Dumper;
use utf8; # a la François
use open OUT => ':encoding(utf8)';
use open ':std';
sub rangeparse {
local $_ = shift;
my @o; # row1,col1, row2,col2 (-1 = last row/col)
if (@o=/\AR(0-9+|n)C(0-9+|n):R(0-9+|n)C(0-9+|n)\z/) {}
elsif (/\AR(0-9+|n):R(0-9+|n)\z/) { @o=($1,1,$2,-1) }
elsif (/\AC(0-9+|n):C(0-9+|n)\z/) { @o=(1,$1,-1,$2) }
elsif (/\AR(0-9+|n)C(0-9+|n)\z/) { @o=($1,$2,$1,$2) }
elsif (/\AR(0-9+|n)\z/) { @o=($1,1,$1,-1) }
elsif (/\AC(0-9+|n)\z/) { @o=(1,$1,-1,$1) }
else { croak "failed to parse '$_'" }
$_ eq 'n' and $_=-1 for @o;
return \@o;
}
use Test::More tests=>2;
is_deeply rangeparse("RnC2:RnC5"), -1, 2, -1, 5 ;
is_deeply rangeparse("R3C2:RnCn"), 3, 2, -1,-1 ;
my $data = ['й', ' ', ' ', 'л', ' ', ' ', 'с', ' ', ' ', 1..9];
say Dumper $data;
sub getsubset {
my ($data,$range) = @_;
my $cols = @{$$data[0]};
@$_==$cols or croak "data not rectangular" for @$data;
$range = rangeparse($range) unless ref $range eq 'ARRAY';
@$range==4 or croak "bad size of range";
my @max = (0+@$data,$cols)x2;
for my $i (0..3) {
$$range$i=$max$i if $$range$i<0;
croak "index $i out of range"
if $$range$i<1 || $$range$i>$max$i;
}
croak "bad rows $$range[0]-$$range2" if $$range[0]>$$range2;
croak "bad cols $$range1-$$range3" if $$range1>$$range3;
my @cis = $$range1-1 .. $$range3-1;
return [ map { sub{\@_}->(@{$$data$_}@cis) }
$$range[0]-1 .. $$range2-1 ]
}
This is a trimmed down version of haukex's result in Selecting Ranges of 2-Dimensional Data. I'm populating it with cyrillic values and hope to run some tests, but I still want to get this clone tool squared away. Still working through other parts of your post....
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: create clone script for utf8 encoding
by haukex (Archbishop) on Dec 19, 2018 at 10:15 UTC | |
by Aldebaran (Curate) on Dec 19, 2018 at 21:23 UTC |