First, let's clear some confusion. Unicode doesn't specify how characters are stored, so you can't possible be talking about Unicode when you're talking about a string of bytes. It looks like you meant UTF-8 when you said Unicode. UTF-8 is a means of representing (encoding) Unicode characters in bytes.
$string =~ s/([^a-zA-Z0-9])/'&#'.unpack('U0U*',$1).';'/eg;
can also be written as
use HTML::Entites qw( encode_entities ); $string = encode_entities($string);
and
use Encode qw( encode ); $string = encode('US-ASCII', $string, Encode::FB_HTMLCREF);
No need to reinvent the wheel.
If you use the latter, you can combine the decoding and encoding into one step.
use Encode qw( from_to ); sub unicode_decode { my $string = shift; from_to($string, 'UTF-8', 'US-ASCII', Encode::FB_HTMLCREF); return($string); }
In reply to Re: Unicode to HTML code &#....;
by ikegami
in thread Unicode to HTML code &#....;
by Forlix
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |