in reply to Re^3: somethign wrong with the sumbit
in thread somethign wrong with the sumbit

I tried both ways but i got this error in both cases as well:
Software error: Cannot decode string with wide characters at C:/Perl/lib/Encode.pm lin +e 182.
Also i beleive there is no need to explicitly tell perl to handle param('select') as utf8 it must do this by default i think.

In the past this code used to work without any need for utf8 conversion as long as it concerns param('select') the only conversion needed was this:

Encode::from_to($_, 'ISO-8859-7', 'utf8') for @display_files;

Replies are listed 'Best First'.
Re^5: somethign wrong with the sumbit
by graff (Chancellor) on Dec 30, 2007 at 20:03 UTC
    I tried both ways but i got this error in both cases as well:
    Software error: Cannot decode string with wide characters at C:/Perl/lib/Encode.pm lin +e 182.

    You should show, for both cases the contents of line 182 of your script. As it is, I can only guess that you did not actually follow my suggestions, because only one of the two approaches uses the Encode::decode function, so only that one case would issue this particular error message. If the "decode()" call was being used in both cases, then you didn't understand my second suggestion, and you probably didn't do it right.

    In any case, at this point, I've lost track of what problem you are actually having. The form returns a utf8 string value for the "select" param, and this value comes from the "@display_files" array, which you use both to create the menu and to test the return value of the "select" param. The array contains file names that are read from your directory as iso-8859-7 strings, and you convert them to utf8 before putting them into the popup menu, and you are confident that the strings being returned by the form are being correctly handled as utf8 string. And despite all this being true, your testing of the "param('select')" value never succeeds?

    Try an experiment. Reduce the process to just the bare minimum, where your cgi script dummies up a list of Greek "name" strings, puts up a form with a popup menu, and checks the param value that comes in when the form is submitted. With the process reduced to just this activity, you can focus more carefully on a variety of diagnostics, if/when it fails. If you can't figure out a diagnostic that reveals the problem, the test script should be small enough to post here in its entirety, and it would be "self-contained" (runnable anywhere), so others can try it out and help find the problem.

    If it doesn't fail, then the task is to figure out what the difference is between this simple test script, and the logic you used in the larger application.

    One last thing to check about the encoding issue. Suppose the client browser's form submission includes a value for the "select" paramater that is four bytes long, and those four bytes (expressed in hex) are:

    ce a6 ce a5
    That would be the utf8 byte stream for a two-character string containing the letters "PHI" and "UPSILON". Let's suppose further that there actually was a file with this name in your directory, and @display_files contains this very same 2-character utf8 string. There's a chance that something in the handling of the input parameter string is doing an improper conversion of the original 4-byte sequence into a perl-internal utf8 string. The result of this improper conversion might be an 8-byte string, consisting of:
    c3 8e c2 a6 c3 8e c2 a5
    That's what you get if the original four-byte string is assumed to be non-utf8 (e.g. iso-8859-1) and is then "converted" to utf8 based on that false assumption. You would be able to check this with a suitable test script where the strings for the menu all the same length. If the string coming back from the form is twice as long, it's a problem with interpreting the form data correctly as utf8 characters.
      Hello Graff, few things to make clear about encodings before i try to make the test script.

      The array contains file names that are read from your directory as iso-8859-7 strings,...
      How can be sure that their encoding is 'iso-8859-7' since we dont know what encoding style windows use to save filenames? How we know for example if the encoding wasnt 'cp1253' or 'utf8'?

      And also can something that its native some encoding be read as another encoding?

      ...and you convert them to utf8 before putting them into the popup menu,
      I didn't want to but i had too because otherwise firefox wouldn't display the filenames correctly in readable Greek text and really don't know why....is it because the print header was in utf8?
      ...and you are confident that the strings being returned by the form are being correctly handled as utf8 string.
      Well, it seemed the correct thing to believe in. Since the items in the popup menu, after the conversion was made, were 'utf8', wasn't it logical to believe that the submitted item that user selected would be also stored in param('select') and handled as well in a 'utf8' manner? I mean if its a utf8 thing why not be "grabbed" as a utf8 thing and handled as a utf8 thing?
      There's a chance that something in the handling of the input parameter string is doing an improper conversion of the original 4-byte sequence into a perl-internal utf8 string. The result of this improper conversion might be an 8-byte string, consisting of: c3 8e c2 a6 c3 8e c2 a5 download
      Up until this point i understaned how utf8 encoding stores 1 char as 2 bytes long and hence 2 chars as 4 bytes long but after that i didnt understand...
      Which is the "input parameter string" You mean param('select') ?!
      What conversion are you refering to? Why change the 4byte string to perl-internal utf8 string?
      That's what you get if the original four-byte string is assumed to be non-utf8 (e.g. iso-8859-1) and is then "converted" to utf8 based on that false assumption.
      You mean the initial filenames which were 'iso-8859-7' that i re-encoded to 'utf8' in order to be able to display them properly on browser?

      Why is this wrong? The content is still the same(the name of the file) only the storage capacity changes. Sorry for 2 many questions but this encoding concept is distorted in my head and i have to ask you to helpe me clear it because i beleive we are in the heart of this weird problem.

        You said:
        How can be sure that their encoding is 'iso-8859-7' since we dont know what encoding style windows use to save filenames? How we know for example if the encoding wasnt 'cp1253' or 'utf8'?
        Dude, I thought you already knew this -- I was just repeating information that I found in your original post at the top of this thread. This is the line in the OP that led me to make that statement:
        Encode::from_to($_, 'ISO-8859-7', 'utf8') for @display_files;

        So you tell me: how can you be sure that the file name encoding is iso-8859-7 on your machine? If you don't know, then you have problems that I probably cannot help you solve.

        And also can something that its native some encoding be read as another encoding?

        A stream of bytes representing character data can be read as if it were anything at all -- it's just a stream of bytes -- but it's only going to make sense if it is interpreted correctly, according to the intended character encoding.

        it seemed the correct thing to believe in

        So your problem boils down to a tendency towards "faith-based programming". Learn to be a skeptic.

        You need to focus on the advice about doing an experiment. I wanted to make sure that this would work, so I've done the experiment already, and now you can try this yourself to see if it works for you. (It works for me.)

        #!/usr/bin/perl -T use strict; use CGI; use Encode; my $c = new CGI; my @menu_choices = ( "\x{03a6}\x{03a5}", "\x{03b2}\x{03c1}" ); binmode STDOUT, ":utf8"; print $c->header, $c->start_html; print $c->h3("(Display should be readable as utf8)"), $c->h3("\x{0395}\x{03c0}\x{03ad}\x{03bb}\x{03b5}\x{03be}\x{03b5} +"); if ( $c->param( 'select' )) { # use bytes; # my $val = $c->param('select'); my $val = decode( 'utf8', $c->param( 'select' )); my $match = ( grep /^$val$/, @menu_choices ) ? "matches" : "fails +to match"; printf "<P>The value %s received from the form has length %d, and +%s.</P>", $val, length( $val ), $match; } print $c->start_form, $c->popup_menu( -name => 'select', -values => \@menu_choices ), $c->submit( 'ok' ); $c->end_form; print $c->end_html; exit;
        If I comment out the "use Encode" and the line with the "decode()" call, and also uncomment the other two lines ("use bytes" and the simpler assignment to $val), it reports a failure to match, and I'm not sure why that doesn't work. (BTW, I'm using Perl 5.8.8, built for macosx 10.5)

        I also tried $c->start_html( -encoding => 'UTF-8') instead of the default html header, and that did not cause my Safari browser's "default" setting for encoding to do the right thing -- it seems I have to set this browser explicitly for any non-Latin-1 character set. This leads me to suggest that it's a good idea to include a clearly visible ASCII string in your page display, telling the viewers what character encoding they should be using in their browsers, as demonstrated in my test script.

        (I know there should be a way to tell the browser how to do the right thing automatically, and I can't wait to learn about that...)

        update: Found it (duh!):

        print $cgi->header(-charset => 'utf-8'), $cgi->start_html;
        works with both Safari and Firefox. Presumably, if you really wanted to use iso-8859-7 instead of utf8 on your web pages, you set the "-charset" property for the http header accordingly. But I think you're better off working with utf8.