This will, I hope, explain what’s going on–
use warnings; use strict; use Encode; use HTML::Entities; my $str = "\xe2\x80\x9cquotes\xe2\x80\x9d"; print "Is $str UTF-8? ", Encode::is_utf8($str) ? "Yes!\n" : "No...\n"; $str = decode("UTF-8", $str, Encode::FB_CROAK); binmode STDOUT, ":encoding(UTF-8)"; print "It's still $str... UTF-8 now? ", Encode::is_utf8($str) ? "Yes!\n" : "No...\n"; my $wide_chars = "\x{201C}quotes\x{201D}"; print "How about this version: $wide_chars? ", Encode::is_utf8($wide_chars) ? "Yes!\n" : "No...\n"; print "Entities: ", encode_entities($str), $/; __END__ Is “quotes” UTF-8? No... It's still “quotes”... UTF-8 now? Yes! How about this version: “quotes”? Yes! Entities: “quotes”
Update: changed $non_combining to $wide_chars as the name was misleading.
In reply to Re: HTML::Entities and Unicode quotes
by Your Mother
in thread HTML::Entities and Unicode quotes
by tod222
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |