in reply to Re: create clone script for utf8 encoding
in thread create clone script for utf8 encoding
Am I correct that what my OS is telling me is its best guess as to how to interpret this file and have it make any sense?
$ file -i *.pl 18.clone.pl: text/x-perl; charset=us-ascii 1.a.pl: text/x-perl; charset=utf-8 1.haukex.pl: text/x-perl; charset=us-ascii 1.k.pl: text/x-perl; charset=us-ascii 2.haukex.pl: text/x-perl; charset=us-ascii 3.haukex.pl: text/x-perl; charset=utf-8 3.ping3a.pl: text/x-perl; charset=us-ascii 4.haukex.pl: text/x-perl; charset=utf-8 4.ping3a.pl: text/x-perl; charset=us-ascii 5.ping3a.pl: text/x-perl; charset=us-ascii $
What seems to be very much the case is that the OS thinks the doc is utf8 if there are utf8 non-ascii characters in it. I did nothing with the #.haukex scripts to change from us-ascii to utf8 but begin to include cyrillic characters, like so with pre tags:
$ ./1.a.pl 3.haukex.pl argv is 3.haukex.pl before decode is 3.haukex.pl after decode is 3.haukex.pl current is /home/bob/2.scripts/pages/1.cw/template_stuff/translations/ +rus.cw ------------- in_file: 3.haukex.pl new base is 4.haukex.pl save path is /home/bob/2.scripts/pages/1.cw/template_stuff/translation +s/rus.cw/4.haukex.pl return is /home/bob/2.scripts/pages/1.cw/template_stuff/translations/r +us.cw/4.haukex.pl 2.haukex.pl 3.haukex.pl 4.haukex.pl#!/usr/bin/perl -w use 5.011; use Carp; use Data::Alias 'alias'; use Data::Dumper; use utf8; # a la François use open OUT => ':encoding(utf8)'; use open ':std'; sub rangeparse { local $_ = shift; my @o; # [ row1,col1, row2,col2 ] (-1 = last row/col) if (@o=/\AR([0-9]+|n)C([0-9]+|n):R([0-9]+|n)C([0-9]+|n)\z/) {} elsif (/\AR([0-9]+|n):R([0-9]+|n)\z/) { @o=($1,1,$2,-1) } elsif (/\AC([0-9]+|n):C([0-9]+|n)\z/) { @o=(1,$1,-1,$2) } elsif (/\AR([0-9]+|n)C([0-9]+|n)\z/) { @o=($1,$2,$1,$2) } elsif (/\AR([0-9]+|n)\z/) { @o=($1,1,$1,-1) } elsif (/\AC([0-9]+|n)\z/) { @o=(1,$1,-1,$1) } else { croak "failed to parse '$_'" } $_ eq 'n' and $_=-1 for @o; return \@o; } use Test::More tests=>2; is_deeply rangeparse("RnC2:RnC5"), [ -1, 2, -1, 5 ]; is_deeply rangeparse("R3C2:RnCn"), [ 3, 2, -1,-1 ]; my $data = [['й', ' ', ' ', 'л', ' ', ' ', 'с', ' ' +, ' '], [1..9]]; say Dumper $data; sub getsubset { my ($data,$range) = @_; my $cols = @{$$data[0]}; @$_==$cols or croak "data not rectangular" for @$data; $range = rangeparse($range) unless ref $range eq 'ARRAY'; @$range==4 or croak "bad size of range"; my @max = (0+@$data,$cols)x2; for my $i (0..3) { $$range[$i]=$max[$i] if $$range[$i]<0; croak "index $i out of range" if $$range[$i]<1 || $$range[$i]>$max[$i]; } croak "bad rows $$range[0]-$$range[2]" if $$range[0]>$$range[2]; croak "bad cols $$range[1]-$$range[3]" if $$range[1]>$$range[3]; my @cis = $$range[1]-1 .. $$range[3]-1; return [ map { sub{\@_}->(@{$$data[$_]}[@cis]) } $$range[0]-1 .. $$range[2]-1 ] }
This is a trimmed down version of haukex's result in Selecting Ranges of 2-Dimensional Data. I'm populating it with cyrillic values and hope to run some tests, but I still want to get this clone tool squared away. Still working through other parts of your post....
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: create clone script for utf8 encoding
by haukex (Archbishop) on Dec 19, 2018 at 10:15 UTC | |
by Aldebaran (Curate) on Dec 19, 2018 at 21:23 UTC |