in reply to Re: Bug in Template?
in thread Bug in Template?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Bug in Template?
by remiah (Hermit) on Mar 22, 2012 at 03:54 UTC | |
This seems not a problem of Template. I also want advice for this. “Séan”'s é may be 00E9 of unicode table http://www.utf8-chartable.de/unicode-utf8-table.pl. I thought decode it to perl internal utf8 and pass them to Template encoding it utf8 will work. But it is not work. Without Template, there is strange behavior. It is strange No3 only works at this case. I usualy print characters with No 4. Japanese characters like "hiragana" seems to have no problem( for example,'3041' .. '3096'). I saw similar problem at Why Doesn't Text::CSV_XS Print Valid UTF-8 Text When Used With the open Pragma?. At that time, I didn't understand well and thought newer version would have no problem... Is this the same trouble? I tried with 5.012002 and 5.014002. They print exact same output except version number. | [reply] [d/l] |
by Anonymous Monk on Mar 22, 2012 at 08:27 UTC | |
I'm confused by your code, what is it supposed to demonstrate? perlunitut: Unicode in Perl warns against using is_utf8, so I wouldn't use it Consider when viewed as Windows-1252 it is À And this when viewed as Windows-1252 it is À but viewed as UTF-8 it is À And this when viewed as Windows-1252 it is � but viewed as UTF-8 it is � If you search for ef bf bd you'll see lots of questions about this erroneous conversion So if you want to treat chr 192 ( perl -le " print hex q/C0/ " ) as unicode you have to encode it, because characters 0 to 255 are also valid Latin-1, they are not utf8
Or, if you want chr 192 to return unicode, use encoding pragma ( utf8 pragma doesn't affect chr )
| [reply] [d/l] [select] |
by Anonymous Monk on Mar 22, 2012 at 08:33 UTC | |
Like http://www.utf8-chartable.de/unicode-utf8-table.pl?start=192 shows, unicode code point U+00C0 encoded as UTF-8 is c3 80 | [reply] [d/l] |
by remiah (Hermit) on Mar 22, 2012 at 10:58 UTC | |
Thanks for reply. I will read perlunitut and found sites that explains unicode in perl precisely when googled with "ef bf bd". I am printing now... When the characer comes from outside of perl, We have to decode the bytes to perl's internal utf8, as perlunitut says. Especially when you want to know the length of characer. For example, cgi's param() will return bytes and when I want to know the length of the word, I decode it. My question in short, here comes two character '00E9' and '3041'. They must be two character in utf8. How do you substring the second character and print it? I agree my example clumsy. Is this clear? I guess this is OP's problem. | [reply] |