By my question I mean that after upload the japanese characters (shiftjis encoding) of the file name are changed to unicode representation #17887⯠ and ex. or to 組織チャット.
The question is if there is a way to preserve the file name that will not be changed during uplooad to the server.
| [reply] |
After a loooong and rather fruitless discussion with you on the Chatterbox (and with an appreciated contribution of theorbtwo), it would appear to me like that your second example is actually still Shift-JIS encoding, but displayed as Latin-1.
Let me walk you through what brings me to this conclusion. First of all, if you force the browser (I use Firefox) to use Shift-JIS encoding on these pages on Perlmonks, your string turns out to look like Japanese, at least to me.
Displaying that string as a hexdump, using the code
local($\, $,) = ("\n", " ");
print map { sprintf "%02X", $_ } unpack 'C*', '組織チャット';
The result is :
91 67 90 44 83 60 83 83 83 62 83 67
which at least follow the structure of Shift-JIS: two bytes each, the first in one of the ranges 0x81-0x9F or 0xE0-0xEA, and the second in the range 0x3F-0xFF. This is apparently the case.
What likely happens is what is described by Alan Flavell in FORM submission and i18n, is that HTML FORM submission encoding, and that includes file names for uploads, commonly happens in the encoding the HTML FORM itself is in. In the latter example, your form used the Shift-JIS encoding, so you said in the Chatterbox, and that's why the name arrived in Shift-JIS — even though you see Latin-1. But that's just a matter of how you display the text. If the HTML page displaying the name was also specified to be in Shift-JIS, you'd likely see them displayed as you intended.
Now, what happens in the other case, which is new to me, I think you're not using Shift-JIS for the form, and the result is, indeed, corrupted — and in a very browser-dependant way.
My conclusion would be that if you use Shift-JIS for both the form and for the result page, you'd be fine. If you insist on displaying the names in a page that isn't encoded in Shift-JIS, I would think that Encode, which comes with perl 5.8.x and later (and which doesn't work on anything earlier, so there's no need to try install it on an older perl), can handle conversion of Shift-JIS to Unicode/UTF-8. Using numerical entities on all the Unicode characters with character code >= 128, with for example
s/([^0-\7F])/sprintf "&#%d;", ord $1/ge;
then you can display them safely in any HTML page.
| [reply] [d/l] [select] |