in reply to Re: Best Way to Get Length of UTF-8 String in Bytes?
in thread Best Way to Get Length of UTF-8 String in Bytes?
Thank you, ikegami.
Here's what I had tried before posting my inquiry:
#!perl
use strict;
use warnings;
use open qw( :utf8 :std );
use utf8;
# 'China' in Simplified Chinese
# 中 国
# Unicode U+4E2D U+56FD
# UTF-8 E4 B8 AD E5 9B BD
my $text = '中国';
my $length_in_characters = length $text;
print "Length of text '$text' in characters is $length_in_characters\n";
{
use bytes;
my $length_in_bytes = length $text;
print "Length of text '$text' in bytes is $length_in_bytes\n";
}
{
require Encode;
my $bytes = Encode::encode_utf8($text);
my $length_in_bytes = length $bytes;
print "Length of text '$bytes' in bytes is $length_in_bytes\n";
}
And here's its output:
Length of text '中国' in characters is 2 Length of text 'ä¸å½' in bytes is 6 Length of text 'ä¸å½' in bytes is 6
(I couldn't use <code> tags here due to the Chinese characters in both the script and its output.)
Jim
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Best Way to Get Length of UTF-8 String in Bytes?
by ikegami (Patriarch) on Apr 24, 2011 at 03:19 UTC | |
by tchrist (Pilgrim) on Apr 24, 2011 at 05:53 UTC | |
by ikegami (Patriarch) on Apr 24, 2011 at 06:00 UTC | |
by John M. Dlugosz (Monsignor) on Apr 24, 2011 at 11:29 UTC |