Just like concatenation imposes string context, and multiplication numeric context, print imposes "binary" context (and encodes and warns if necessary), and uc imposes "character" context (and decoes with Latin-1 if the string holds undecoded octests).

Well. My point was different. It is correct that perl does certain conversions behind the stage, and certain warnings are given out because perl has to produce the result. But my point was, that without the help of the developer, perl can not do 100% correct work. It just does what works most of the time. The context is imposed, but if the string is not in proper internal form, then "characters" that perl works with might be quite wrong from the developer's stand point.

I can give you examples of bad confusion that I had in mind.

Module MP3::Tag::ID3v2 provides method "get_frame" which returns string as sequence of octets. So to convert the encoding developer has to use "Encode::from_to". But the method "change_frame" of the same module expects string in "internal form" because internally it calls Encode::encode on the input. So the developer can't pass the string returned by "get_frame" as input to "change_frame" unless he calls "Encode::decode" on it.

Another example. The DBD modules may return strings from databases either as octets or in "internal form". But if you pass these strings to say Gtk2 modules, then they must be only in "internal form". So the developer have to execute care what kind of output he/she gets from the DBD modules.

I believe, that part of the confusion lays in the badly written modules. Since perl provides function "is_utf8", it is very easy to check what kind of input the user has provided and use appropriate "Encode::encode" or "Encode::decode" to get the desired form. But we have what we have, so the developers have to watch out for the type of strings they work with.


In reply to Re^2: text encodings and perl by andal
in thread text encodings and perl by andal

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.