After a loooong and rather fruitless discussion with you on the Chatterbox (and with an appreciated contribution of theorbtwo), it would appear to me like that your second example is actually still Shift-JIS encoding, but displayed as Latin-1.

Let me walk you through what brings me to this conclusion. First of all, if you force the browser (I use Firefox) to use Shift-JIS encoding on these pages on Perlmonks, your string turns out to look like Japanese, at least to me.

Displaying that string as a hexdump, using the code

local($\, $,) = ("\n", " "); print map { sprintf "%02X", $_ } unpack 'C*', '組織チャット';
The result is :
91 67 90 44 83 60 83 83 83 62 83 67
which at least follow the structure of Shift-JIS: two bytes each, the first in one of the ranges 0x81-0x9F or 0xE0-0xEA, and the second in the range 0x3F-0xFF. This is apparently the case.

What likely happens is what is described by Alan Flavell in FORM submission and i18n, is that HTML FORM submission encoding, and that includes file names for uploads, commonly happens in the encoding the HTML FORM itself is in. In the latter example, your form used the Shift-JIS encoding, so you said in the Chatterbox, and that's why the name arrived in Shift-JIS — even though you see Latin-1. But that's just a matter of how you display the text. If the HTML page displaying the name was also specified to be in Shift-JIS, you'd likely see them displayed as you intended.

Now, what happens in the other case, which is new to me, I think you're not using Shift-JIS for the form, and the result is, indeed, corrupted — and in a very browser-dependant way.

My conclusion would be that if you use Shift-JIS for both the form and for the result page, you'd be fine. If you insist on displaying the names in a page that isn't encoded in Shift-JIS, I would think that Encode, which comes with perl 5.8.x and later (and which doesn't work on anything earlier, so there's no need to try install it on an older perl), can handle conversion of Shift-JIS to Unicode/UTF-8. Using numerical entities on all the Unicode characters with character code >= 128, with for example

s/([^0-\7F])/sprintf "&#%d;", ord $1/ge;
then you can display them safely in any HTML page.

In reply to Re^3: File Name changes during upload by bart
in thread File Name changes during upload by eyalman

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.