First of all i'am VERY GLAD to have solved the problem myself, 2 days ago before i read your last post
It looks like it wasn't an encoding problem at all.I'll tell you later on what i did
Your todays explanation was very insightful and helped me understand even more about encodings
I have also managed to run your test cgi script and saw that the value before submission and the value returned was the same, so indeed the browser returned the value user selected intact exactly the same as the original.What i tried before 2 days was this:
print header( -charset=>'utf8' ); my $article = param('select') || "Αρχική + Σελίδα!"; my @files = glob "$ENV{'DOCUMENT_ROOT'}/data/text/*.txt"; my @menu_files = map m{([^/]+)\.txt}, @files; Encode::from_to($_, 'ISO-8859-7', 'utf8') for @menu_files; if ( param('select') ) { #If user selected an item from the drop dow +n menu $article = decode( 'utf8', $article ); unless ( grep /^\Q$_\E$/, @menu_files ) #Unless user selection do +esn't match one of the valid filenames within @display_files { ......
But as i result i got this: Cannot decode string with wide characters at C:/Perl/lib/Encode.pm line 182.
Line 182 is completely irrelevant with "decode()" and i have no idea why Perl refers to it. Its obvious the problem was on line 35 which is this one: $article = decode( 'utf8', $article );
At the time i had no clue what that error meant, but after your today's reply i now know, that, i was running "decode()" on a string that already had the utf8 flag set, and contained a wide character and as you said Perl would return an error to that


But what does that error tell us now? If my thinking is correct, that error tell us, that the parameter the script(index.pl) got back as a return from the browser was utf8 flagged already!!
Why you ask?! Because this line of code Encode::from_to($_, 'ISO-8859-7', 'utf8') for @menu_files; has created for us an array full of well defined 'utf8 flagged ' items since the Perl script itself created this array. So when the user selects one of them and submits it, the browser grabs this 'utf8 flagged' item and sent it back to the script UNTOUCHED as it has been proved from the error we got above, otherwise we wouldn't get this error, as he supposed to do, and that proves your words to be correct in a previous post on this thread, saying that a browser should not alter a string in any way(not even in an encoding manner).

So now we DO know for sure that the browser ain't sending the string back malformed in any way, because if he were then this line of code: $article = decode( 'utf8', $article ); would have no problem being parsed perhaps because the browser might have removed the "internal utf8 flag" Perl uses to characterize the "utf8" data. Do you agree with me with this logic or have i misunderstood?

If the above is TRUE (original and returned strings are identical) then no conversion has to be made neither by doing encodings or decodings. My script works now as intended with no alternation of encodings here is the code:

print header( -charset=>'utf8' ); my $article = param('select') || "&#913;&#961;&#967;&#953;&#954;&#942; + &#931;&#949;&#955;&#943;&#948;&#945;!"; my @files = glob "$ENV{'DOCUMENT_ROOT'}/data/text/*.txt"; my @menu_files = map m{([^/]+)\.txt}, @files; Encode::from_to($_, 'ISO-8859-7', 'utf8') for @menu_files; if ( param('select') ) { #If user selected an item from the drop dow +n menu #No alternation to utf8 encoding or decoding is needed here....the ret +urned value is consisted of utf8 flag and contains wide characters as + the original unless ( grep /^\Q$_\E$/, @menu_files ) #Unless user selection do +esn't match one of the valid filenames within @display_files { if( param('select') =~ /\0/ ) { $article = "*Null Byte Injection* attempted & logged!"; print br() x 2, h1( {class=>'big'}, $article ); } if( param('select') =~ /\/\.\./ ) { $article = "*Backwards Directory Traversal* attempted & logge +d!"; print br() x 2, h1( {class=>'big'}, $article ); } $select = $db->prepare( "UPDATE guestlog SET article=?, date=?, +counter=counter+1 WHERE host=?" ); $select->execute( $article, $date, $host ); exit 0; } Encode::from_to($article, 'utf8', 'ISO-8859-7'); #Convert user sel +ected filename to greek-iso so it can be opened open FILE, "<$ENV{'DOCUMENT_ROOT'}/data/text/$article.txt" or die $ +!; local $/; $data = <FILE>; close FILE; Encode::from_to($article, 'ISO-8859-7', 'utf8'); #Convert user sel +ected filename back to utf8 before inserting into db $select = $db->prepare( "UPDATE guestlog SET article=?, date=?, cou +nter=counter+1 WHERE host=?" ); $select->execute( $article, $date, $host ); } else {
The only thing i corrected was the $data variable before sending the contents of the file to the javascript.
for ($data) { #Replace special chars like single & double quotes to i +ts literally values s/\n/\\n/g; s/'/\\'/g; s/"/\"/g; tr/\cM//d; }
because single and double quotes were incorrectly interpolated as special chars. I you visit my page now http://nikos.no-ip.org and test it by selecting something you'll notice it works normally

Also you last suggestion still doesn't work:

print header( -charset=>'utf8' ); my $article = param('select') || "&#913;&#961;&#967;&#953;&#954;&#942; + &#931;&#949;&#955;&#943;&#948;&#945;!"; my @files = glob "$ENV{'DOCUMENT_ROOT'}/data/text/*.txt"; my @menu_files = map m{([^/]+)\.txt}, @files; Encode::from_to($_, 'ISO-8859-7', 'utf8') for @menu_files; if ( param('select') ) { #If user selected an item from the drop dow +n menu $article = encode( 'utf8', $article ); unless ( grep /^\Q$_\E$/, @menu_files ) #Unless user selection do +esn't match one of the valid filenames within @display_files { ......
i get this error: Invalid argument at D:\www\cgi-bin\index.pl line 57. Line 57 is a correct line this time trying to open FILE, "<$ENV{'DOCUMENT_ROOT'}/data/text/$article.txt" or die $!; encoding must have messed the variable up somehow....

ps1: Your test cgi script required me to turn taint mode(-T) off in order to run

ps2: I don't yet understand whats the difference of $article = encode( 'utf8', $article ); opposed to $article = decode( 'utf8', $article );

ps3. I cant run the one-linears: i get Can't find string terminator "'" anywhere before EOF at -e line 1. Tried to switch single with double quotes but iam still getting errors.


In reply to Re^4: somethign wrong with the sumbit by Nik
in thread somethign wrong with the sumbit by Nik

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.