Dear monks,
After reading
http://perldoc.perl.org/Encode.html#Handling-Malformed-Data
I still have problems understanding how the CHECK parameter for the encode and decode subroutines works. The following questions are ALL related to encoding to UTF-8 and decoding from UTF-8 (I won't use other encodings in the future).
First, what sense does this parameter make when encoding to UTF-8? Are there characters which could occur in perl strings and which could not be encoded in UTF-8? Probably there are, because otherwise the CHECK parameter for the encode function didn't make sense, did it?
Second, if I use FB_DEFAULT for the CHECK parameter in encode, what is SUBCHAR?
Third, I am understanding the code example which is given in the explanation of FB_QUIET as far as it concerns valid input streams. But what if the input data not only gets fragmented by reading chunks of fixed size (this would be correctly fixed by the example code), but actually contains invalid bytes? In this case, $buffer would contain the portion starting with the invalid byte; in the next loop run, the invalid byte again would not be processed (because it is invalid), thus leaving $buffer as is. This would lead to an infinite loop, wouldn't it?
Fourth, is the following statement true?
"If I make a perl string from an input stream of octets using decode and then make an output stream of octets from that perl string using encode, then encode will never run into invalid characters *regardless* of which constant for CHECK I had used when *decoding*."
(I am aware of that the output stream might be different from the input stream, but that is not the question).
Thank you very much,
Nocturnus
In reply to Question about Encode module and CHECK parameter by Nocturnus
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |