Nocturnus has asked for the wisdom of the Perl Monks concerning the following question:
Dear monks,
After reading
http://perldoc.perl.org/Encode.html#Handling-Malformed-Data
I still have problems understanding how the CHECK parameter for the encode and decode subroutines works. The following questions are ALL related to encoding to UTF-8 and decoding from UTF-8 (I won't use other encodings in the future).
First, what sense does this parameter make when encoding to UTF-8? Are there characters which could occur in perl strings and which could not be encoded in UTF-8? Probably there are, because otherwise the CHECK parameter for the encode function didn't make sense, did it?
Second, if I use FB_DEFAULT for the CHECK parameter in encode, what is SUBCHAR?
Third, I am understanding the code example which is given in the explanation of FB_QUIET as far as it concerns valid input streams. But what if the input data not only gets fragmented by reading chunks of fixed size (this would be correctly fixed by the example code), but actually contains invalid bytes? In this case, $buffer would contain the portion starting with the invalid byte; in the next loop run, the invalid byte again would not be processed (because it is invalid), thus leaving $buffer as is. This would lead to an infinite loop, wouldn't it?
Fourth, is the following statement true?
"If I make a perl string from an input stream of octets using decode and then make an output stream of octets from that perl string using encode, then encode will never run into invalid characters *regardless* of which constant for CHECK I had used when *decoding*."
(I am aware of that the output stream might be different from the input stream, but that is not the question).
Thank you very much,
Nocturnus
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Question about Encode module and CHECK parameter
by afoken (Chancellor) on Aug 07, 2015 at 17:57 UTC | |
by Nocturnus (Scribe) on Aug 08, 2015 at 13:46 UTC | |
by afoken (Chancellor) on Aug 09, 2015 at 08:20 UTC | |
by Nocturnus (Scribe) on Aug 10, 2015 at 06:47 UTC | |
by afoken (Chancellor) on Aug 10, 2015 at 18:01 UTC | |
|