Rodster001 has asked for the wisdom of the Perl Monks concerning the following question:
This is what I am seeing in my terminal (I am using secure crt, with Terminal > Appearance > Character encoding: UTF-8)#!/usr/bin/perl use strict; use Encode; use Text::Unaccent::PurePerl qw(unac_string); use utf8; my $string = "Queensrÿche"; no utf8; chars($string); (Encode::is_utf8($string))? print " - this is utf8\n" : print " - this + is NOT utf8\n"; print "unaccented: " . Text::Unaccent::PurePerl::unac_string($string) +. "\n"; print $string; exit; sub chars { my $k = shift; my @chars = split("",$k); foreach (@chars) { my $dec = ord($_); my $chr = chr(ord($_)); my $escape = qquote($_); print "\t$dec\t$chr\t$escape\n"; } } sub qquote { local($_) = shift; s/([\\\"\@\$])/\\$1/g; my $bytes; { use bytes; $bytes = length } s/([[:^ascii:]])/'\x{'.sprintf("%x",ord($1)).'}'/ge if $bytes +> length; return $_; }
Here are my questions about this:81 Q Q 117 u u 101 e e 101 e e 110 n n 115 s s 114 r r 255 {ff} 99 c c 104 h h 101 e e - this is utf8 unaccented: Queensryche Queensr
This is where the deep confusion is for me.81 Q Q 117 u u 101 e e 101 e e 110 n n 115 s s 114 r r 195 {c3} 191 {bf} 99 c c 104 h h 101 e e - this is utf8 unaccented: QueensrA Queensrÿche
Update #1:
--------------------------------------
Taking out the "use utf8" and "no utf8":
And then running it again:#use utf8; my $string = "Queensrÿche"; #no utf8;
This confuses me even more. I understand the utf8 flag is not set now, so Encode doesn't see it as utf8. But I see the two utf-8 bytes for the "ÿ" are there (195 191) instead of 255 when using "use utf8". It prints correctly (and displays in my terminal properly) but does not unaccent correctly. Much confusion.81 Q Q 117 u u 101 e e 101 e e 110 n n 115 s s 114 r r 195 191 99 c c 104 h h 101 e e - this is NOT utf8 unaccented: QueensrA Queensrÿche
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: The Queensrÿche Situation
by aitap (Curate) on Oct 19, 2014 at 19:00 UTC | |
by Rodster001 (Pilgrim) on Oct 19, 2014 at 19:24 UTC | |
by aitap (Curate) on Oct 19, 2014 at 20:01 UTC | |
by Rodster001 (Pilgrim) on Oct 19, 2014 at 21:00 UTC | |
by karlgoethebier (Abbot) on Oct 19, 2014 at 21:02 UTC | |
by Rodster001 (Pilgrim) on Oct 19, 2014 at 21:38 UTC | |
by karlgoethebier (Abbot) on Oct 21, 2014 at 07:56 UTC | |
by Rodster001 (Pilgrim) on Oct 19, 2014 at 20:00 UTC | |
Re: The Queensrÿche Situation
by Jim (Curate) on Oct 19, 2014 at 20:39 UTC | |
Re: The Queensrÿche Situation
by ikegami (Patriarch) on Oct 19, 2014 at 22:49 UTC | |
by Rodster001 (Pilgrim) on Oct 19, 2014 at 23:16 UTC | |
by ikegami (Patriarch) on Oct 20, 2014 at 02:18 UTC | |
Re: The Queensrÿche Situation
by LanX (Saint) on Oct 19, 2014 at 18:06 UTC | |
by Tux (Canon) on Oct 19, 2014 at 19:16 UTC | |
by LanX (Saint) on Oct 19, 2014 at 19:53 UTC | |
by Rodster001 (Pilgrim) on Oct 19, 2014 at 18:13 UTC | |
by LanX (Saint) on Oct 19, 2014 at 18:21 UTC | |
by Rodster001 (Pilgrim) on Oct 19, 2014 at 18:42 UTC | |
by Jim (Curate) on Oct 19, 2014 at 20:02 UTC | |
Re: The Queensrÿche Situation
by Jim (Curate) on Oct 19, 2014 at 22:17 UTC |