in reply to Unicode to HTML code &#....;
First, let's clear some confusion. Unicode doesn't specify how characters are stored, so you can't possible be talking about Unicode when you're talking about a string of bytes. It looks like you meant UTF-8 when you said Unicode. UTF-8 is a means of representing (encoding) Unicode characters in bytes.
$string =~ s/([^a-zA-Z0-9])/'&#'.unpack('U0U*',$1).';'/eg;
can also be written as
use HTML::Entites qw( encode_entities ); $string = encode_entities($string);
and
use Encode qw( encode ); $string = encode('US-ASCII', $string, Encode::FB_HTMLCREF);
No need to reinvent the wheel.
If you use the latter, you can combine the decoding and encoding into one step.
use Encode qw( from_to ); sub unicode_decode { my $string = shift; from_to($string, 'UTF-8', 'US-ASCII', Encode::FB_HTMLCREF); return($string); }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Unicode to HTML code &#....;
by Forlix (Novice) on Nov 15, 2008 at 19:57 UTC |