Dear monks,

After reading

http://perldoc.perl.org/Encode.html#Handling-Malformed-Data

I still have problems understanding how the CHECK parameter for the encode and decode subroutines works. The following questions are ALL related to encoding to UTF-8 and decoding from UTF-8 (I won't use other encodings in the future).

First, what sense does this parameter make when encoding to UTF-8? Are there characters which could occur in perl strings and which could not be encoded in UTF-8? Probably there are, because otherwise the CHECK parameter for the encode function didn't make sense, did it?

Second, if I use FB_DEFAULT for the CHECK parameter in encode, what is SUBCHAR?

Third, I am understanding the code example which is given in the explanation of FB_QUIET as far as it concerns valid input streams. But what if the input data not only gets fragmented by reading chunks of fixed size (this would be correctly fixed by the example code), but actually contains invalid bytes? In this case, $buffer would contain the portion starting with the invalid byte; in the next loop run, the invalid byte again would not be processed (because it is invalid), thus leaving $buffer as is. This would lead to an infinite loop, wouldn't it?

Fourth, is the following statement true?

"If I make a perl string from an input stream of octets using decode and then make an output stream of octets from that perl string using encode, then encode will never run into invalid characters *regardless* of which constant for CHECK I had used when *decoding*."

(I am aware of that the output stream might be different from the input stream, but that is not the question).

Thank you very much,

Nocturnus


In reply to Question about Encode module and CHECK parameter by Nocturnus

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.