in reply to Re^5: Unicode strings internals
in thread [SOLVED] Unicode strings internals
The use of a raw filehandle for output of non-binary data
That was the problem, that I used raw filehandle for binary-only data.
something like this:
# $binarydata is binary data, $id is a ASCII-only number, $command is +ASCII-only string. # so result of concatenation should be binary data my $line = "$id\t$command\t$datalength\t$binarydata"; syswrite $file, $line ...
However i've received $id in another part of program, like this:
my ($id, $filename) = split (/\t/, $record);
Problem that $record was UTF-8 character string by intention and contained non-ASCII filename. Thus ASCII-only $id had UTF-8 bit set.
And thus $line was UTF-8 non-ASCII character string with $binarydata screwed (i.e. bytes converted from Latin-1 to UTF-8).
Suprisely everything worked fine, as screwed $binarydata was converted back (bytes from UTF-8 to Latin-1) when I wrote it using syswrite().
So I notices that strange implementation only when added some additional stuff to that code (like I used bytes::length somewhere).
So I am thinking now, either I am responsible to make sure that $id never will have UTF-8 bit set. Either I should, in additional, test it with "confess if is_utf8($id)". Or maybe I should never concatenate binary data with known ASCII-only-data.Or maybe even never concatenate with known binary data...
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^7: Unicode strings internals
by kennethk (Abbot) on May 10, 2013 at 22:13 UTC | |
by vsespb (Chaplain) on May 10, 2013 at 22:32 UTC |