eyalman has asked for the wisdom of the Perl Monks concerning the following question:

Hello All, I am using a script to upload files to the server using cgi upload perl version 5.8.3. this script is running on apache server in linux and Windows OS.
when uploading english file names the upload is running properly and i can upload files without any problem, but when uploading files with japanese/chinese fonts the file names are changed and are corruped therefore i cannot link to the files threw the browser.
the following is the code i have written to upload the files to the server:
my $cgi = $params->{'cgi_object'}; if (!$cgi) { return 0; } my @fields = $cgi->param; my $size=0; foreach my $field (@fields) { my $fh = $cgi->upload($field); next if !$fh; my $basename = $cgi->param($field); if ($basename =~ /([^\/\\]+)$/) { $basename=$1; } if (!$cs) { $basename =~ s/\s+/_/g; $basename = lc($basename); } open(OUTFILE, ">$upload_path$basename") or die $!; binmode OUTFILE; { my $buffer; my $bytesread = read($fh, $buffer, 1024); die "error with file read: $!" if !defined($bytesread); die "error with print: $!" unless (print OUTFILE $buffer); + if ($bytesread) { $size += $bytesread; redo; } } close(OUTFILE);
Does any one of you have a clue how to solve this problem?

Replies are listed 'Best First'.
Re: File Name changes during upload
by polettix (Vicar) on Jul 16, 2005 at 18:09 UTC
    To be honest, I did not understand what your problem exactly is.

    Your foreach cycles through all your form's parameters, looking for valid uploads - quite lazy, uh? ;) Anyway, do you mean that when there are japanese/chinese characters you don't get any valid filehandle into $fh? Or do you mean that the open(OUTFILE... is failing?

    Flavio
    perl -ple'$_=reverse' <<<ti.xittelop@oivalf

    Don't fool yourself.
      By my question I mean that after upload the japanese characters (shiftjis encoding) of the file name are changed to unicode representation #17887⯠ and ex. or to 組織チャット.
      The question is if there is a way to preserve the file name that will not be changed during uplooad to the server.
        After a loooong and rather fruitless discussion with you on the Chatterbox (and with an appreciated contribution of theorbtwo), it would appear to me like that your second example is actually still Shift-JIS encoding, but displayed as Latin-1.

        Let me walk you through what brings me to this conclusion. First of all, if you force the browser (I use Firefox) to use Shift-JIS encoding on these pages on Perlmonks, your string turns out to look like Japanese, at least to me.

        Displaying that string as a hexdump, using the code

        local($\, $,) = ("\n", " "); print map { sprintf "%02X", $_ } unpack 'C*', '組織チャット';
        The result is :
        91 67 90 44 83 60 83 83 83 62 83 67
        
        which at least follow the structure of Shift-JIS: two bytes each, the first in one of the ranges 0x81-0x9F or 0xE0-0xEA, and the second in the range 0x3F-0xFF. This is apparently the case.

        What likely happens is what is described by Alan Flavell in FORM submission and i18n, is that HTML FORM submission encoding, and that includes file names for uploads, commonly happens in the encoding the HTML FORM itself is in. In the latter example, your form used the Shift-JIS encoding, so you said in the Chatterbox, and that's why the name arrived in Shift-JIS — even though you see Latin-1. But that's just a matter of how you display the text. If the HTML page displaying the name was also specified to be in Shift-JIS, you'd likely see them displayed as you intended.

        Now, what happens in the other case, which is new to me, I think you're not using Shift-JIS for the form, and the result is, indeed, corrupted — and in a very browser-dependant way.

        My conclusion would be that if you use Shift-JIS for both the form and for the result page, you'd be fine. If you insist on displaying the names in a page that isn't encoded in Shift-JIS, I would think that Encode, which comes with perl 5.8.x and later (and which doesn't work on anything earlier, so there's no need to try install it on an older perl), can handle conversion of Shift-JIS to Unicode/UTF-8. Using numerical entities on all the Unicode characters with character code >= 128, with for example

        s/([^0-\7F])/sprintf "&#%d;", ord $1/ge;
        then you can display them safely in any HTML page.