I'm no expert but this seems to split up the data correctly:
use utf8;
my $data = 'M02 1731580851? 海 WBPX深????????????? 99
+9999321';
my @chars = $data =~ /(.)/sg;
With the utf8 pragma I get 41 chars, without it I get 65.
-Blake
| [reply] [d/l] |
Perl's handling of Unicode in 5.6.x can be a little troublesome. One problem I've encountered is that literal UTF-8 strings may be recognized fine, but UTF-8 strings read in from a file are always treated as sequences of bytes, not UTF-8 characters. (I believe this is the documented behavior, it's just not what you probably want.)I've had partial success recreating UTF-8 strings from a series of bytes by using pack/unpack with the U template, though if I remember correctly there were still some glitches I encountered with this approach, especially under 5.6.0 (5.6.1 was a bit better). Perl 5.8 is supposed to have much improved Unicode support, and if that's an option for you it might be worth investigating. (Sorry, I don't have any firsthand experience with it yet.) | [reply] |
"fixed length" is fixed number of "chars" or number of "bytes"?
They have different meanings in this context.
| [reply] |