CGI.pm does not decode or encode. The $q->charset method only sets the character set for the Content-Type header.

This means that you have to decode and encode manually (e.g. by using PerlIO layers). Decode everything you got, and encode everything you're about to send.

URL encoded data is byte data, typically without a way to indicate which encoding was used. With POST requests, a charset attribute may be present with the Content-Type: application/x-www-form-urlencoded, but the standard does not require it, or tell you what the default is. In fact, most often, even if it is present, it is ignored.

Query strings and form data are usually encoded with the same encoding (charset) that was used on the HTML page that has the form, but it may not be. My advice for those who have standardized on UTF-8, is to try UTF-8 decoding first, and if it's not valid UTF-8, to use ISO-8859-1 instead.

Note that you MUST NOT use "utf8" when decoding CGI data. It does not actually decode, and as such skips sanity checks. It may cause internal corruption and security bugs. Instead of "utf8", use "UTF-8".


In reply to Re: Understanding CGI.pm and UTF-8 handling by Juerd
in thread Understanding CGI.pm and UTF-8 handling by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.