in reply to HTML::Entities and Unicode quotes
This will, I hope, explain what’s going on–
use warnings; use strict; use Encode; use HTML::Entities; my $str = "\xe2\x80\x9cquotes\xe2\x80\x9d"; print "Is $str UTF-8? ", Encode::is_utf8($str) ? "Yes!\n" : "No...\n"; $str = decode("UTF-8", $str, Encode::FB_CROAK); binmode STDOUT, ":encoding(UTF-8)"; print "It's still $str... UTF-8 now? ", Encode::is_utf8($str) ? "Yes!\n" : "No...\n"; my $wide_chars = "\x{201C}quotes\x{201D}"; print "How about this version: $wide_chars? ", Encode::is_utf8($wide_chars) ? "Yes!\n" : "No...\n"; print "Entities: ", encode_entities($str), $/; __END__ Is “quotes” UTF-8? No... It's still “quotes”... UTF-8 now? Yes! How about this version: “quotes”? Yes! Entities: “quotes”
Update: changed $non_combining to $wide_chars as the name was misleading.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: HTML::Entities and Unicode quotes
by ikegami (Patriarch) on Aug 20, 2011 at 06:22 UTC | |
by Your Mother (Archbishop) on Aug 20, 2011 at 16:41 UTC | |
by ikegami (Patriarch) on Aug 20, 2011 at 20:59 UTC | |
by Your Mother (Archbishop) on Aug 20, 2011 at 22:48 UTC | |
by ikegami (Patriarch) on Aug 20, 2011 at 23:01 UTC | |
|
Re^2: HTML::Entities and Unicode quotes
by tod222 (Pilgrim) on Aug 22, 2011 at 06:23 UTC | |
by ikegami (Patriarch) on Aug 22, 2011 at 06:42 UTC | |
by tod222 (Pilgrim) on Aug 23, 2011 at 03:46 UTC | |
by ikegami (Patriarch) on Aug 23, 2011 at 06:10 UTC | |
by Anonymous Monk on Aug 22, 2011 at 06:51 UTC | |
by tod222 (Pilgrim) on Aug 23, 2011 at 03:54 UTC |