in reply to Re^3: Best Way to Get Length of UTF-8 String in Bytes?
in thread Best Way to Get Length of UTF-8 String in Bytes?
I've studied the demonstration script and I understand everything it's doing, except for this bit:
my $MAX_BYTES = 25; my ($MIN_BPC, $MAX_BPC) = (1, 4); my $MAX_CHARS = $MAX_BYTES / $MIN_BPC;
What's going on here? $MAX_CHARS will always be set to the value of $MAX_BYTES, and $MAX_BPC seems to serve no function. Am I right?
Also, what happens if, in the initial truncation of the string done using substr() as an lvalue, we land smack dab in the middle of a grapheme, and the rightmost character in the resultant truncated string is, by itself, a valid grapheme?
D:\>perl -CO -Mcharnames=:full -wE "$MAX = 4; $cafe = qq/cafe\N{COMBIN +ING ACUTE ACCENT}/; say $cafe; substr($cafe, $MAX) = ''; say $cafe;" +> cafe.txt D:\>
Here's the text in the output file cafe.txt:
café cafe
(Thanks again for this very helpful script!)
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^5: Best Way to Get Length of UTF-8 String in Bytes?
by tchrist (Pilgrim) on Apr 27, 2011 at 05:16 UTC | |
by ikegami (Patriarch) on Apr 27, 2011 at 07:18 UTC | |
by tchrist (Pilgrim) on Apr 30, 2011 at 15:40 UTC |