Decode umlauts on CGI-parameters

Yaerox has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Decode umlauts on CGI-parameters by Corion (Patriarch) on Jul 16, 2015 at 08:24 UTC
The encoding that gets sent by the browser to the server depends on the encoding of your web page and the browser. Ideally, you have both, a `Content-Type:` header specifying the character set and a `<meta http-equiv="content-type" content="text/html; charset=utf-8">` element in your HTML. Note that in my experience, at least Internet Explorer (6) does not send the value of an `<input>` button but only the name (I think).	[reply] [d/l] [select]
Re^2: Decode umlauts on CGI-parameters by Yaerox (Scribe) on Jul 16, 2015 at 08:33 UTC
I got `<meta charset="UTF-8">` on every element. Update: Ahh okay, I thought maybe I should try setting this for my output once too, and if I do use `print "Content-type: text/html\n\n"; print "<html><head><meta charset=\"UTF-8\"></head>"; print "p_sAction: #" . $p_sAction . "#<br>";` [download] The output is right. But like I said, this script won't print anything. I think I forgot an important thing: I got some comparism on this script like the following `if ( $p_sAction eq $aText{'1460'} ){ } else { }` [download] And this doesn't match. $aText is coming out of my DB where I save data uri-escaped, and return it unescaped. Update 2: `# Benutzer löschen print "#$p_sAction# eq #$aText{'1530'}#<br>"; if ( $p_sAction eq $aText{'1530'} ){ print "HELLO WORLD<br>"; }` [download] Output: `#Benutzer löschen# eq #Benutzer löschen#`	[reply] [d/l] [select]
Re: Decode umlauts on CGI-parameters by Anonymous Monk on Jul 16, 2015 at 08:48 UTC
CGI.pm gives you binary values, unless you tell it to to return decoded values. See https://metacpan.org/pod/CGI#utf8	[reply]
Re^2: Decode umlauts on CGI-parameters by Yaerox (Scribe) on Jul 16, 2015 at 09:05 UTC
I did 4 tries `my $p_sAction = $oCGI->param( "action" ); $p_sAction = decode( 'UTF-8', $p_sAction ); if ( !defined( $p_sAction ) ){ print $oCGI->redirect( "access-denied.pl?reason=110" ); exit; } print "Content-type: text/html\n\n"; print "<html><head><meta charset=\"UTF-8\"></head>"; print "p_sAction: #" . $p_sAction . "#<br>"; print "1510 --- #$p_sAction# eq #$aText{'1510'}#<br><br>";` [download] Output: `p_sAction: #Benutzer l�schen# 1510 --- #Benutzer l�schen# eq #Benutzer löschen#` [download] `my $p_sAction = $oCGI->param( "action" ); $p_sAction = decode( 'UTF-8', $p_sAction ); if ( !defined( $p_sAction ) ){ print $oCGI->redirect( "access-denied.pl?reason=110" ); exit; } print "Content-type: text/html\n\n"; print "<html><head><meta charset=\"UTF-8\"></head>"; print "p_sAction: #" . $p_sAction . "#<br>"; $aText{'1530'} = decode( 'UTF-8', $aText{'1510'} ); print "1510 --- #$p_sAction# eq #$aText{'1510'}#<br><br>";` [download] Output: `p_sAction: #Benutzer l�schen# 1510 --- #Benutzer l�schen# eq #Benutzer löschen#` [download] `my $p_sAction = $oCGI->param( "action" ); $p_sAction = decode( 'UTF-8', $p_sAction ); if ( !defined( $p_sAction ) ){ print $oCGI->redirect( "access-denied.pl?reason=110" ); exit; } print "Content-type: text/html\n\n"; print "p_sAction: #" . $p_sAction . "#<br>"; print "1510 --- #$p_sAction# eq #$aText{'1510'}#<br><br>";` [download] Output: `p_sAction: #Benutzer löschen# 1530 --- #Benutzer löschen# eq #Benutzer löschen#` [download] `my $p_sAction = $oCGI->param( "action" ); $p_sAction = decode( 'UTF-8', $p_sAction ); if ( !defined( $p_sAction ) ){ print $oCGI->redirect( "access-denied.pl?reason=110" ); exit; } print "Content-type: text/html\n\n"; print "p_sAction: #" . $p_sAction . "#<br>"; $aText{'1530'} = decode( 'UTF-8', $aText{'1510'} ); print "1510 --- #$p_sAction# eq #$aText{'1510'}#<br><br>";` [download] Output: `p_sAction: #Benutzer löschen# 1530 --- #Benutzer löschen# eq #Benutzer löschen#` [download] Still nothing matchs... Update: If I add `if ( $p_sAction eq "Benutzer löschen" ){ print "HELLO WORLD 22222 - Benutzer löschen<br>"; }` [download] it works ... so it has to be the encoding of $aText{'1510'} I'd say. Doesn't it? I store the text in db like this: "Benutzer%20l%26ouml%3Bschen" (uriescaped_utf8), when I receive it, I uriunescape it and then I compare... I read on stackoverflow (http://stackoverflow.com/questions/17599103/perl-comparing-2-accentuated-strings-with-different-encodingone-being-read-from) using Unicode::Normalize::NFD could help. For me it just makes it more worse.	[reply] [d/l] [select]
Re^3: Decode umlauts on CGI-parameters by Anonymous Monk on Jul 16, 2015 at 21:41 UTC
Are you using Data::Dump compare your data	[reply]
Re: Decode umlauts on CGI-parameters by tangent (Parson) on Jul 16, 2015 at 14:22 UTC
I store the text in db like this: "Benutzer%20l%26ouml%3Bschen" (uriescaped_utf8), when I receive it, I uriunescape it and then I compare... It looks like the value stored in the database contains a HTML Entity, so when you retrieve and uri_unescape it, it is "`Benutzer löschen`", not "`Benutzer löschen`". `uri_unescape( Benutzer%20l%26ouml%3Bschen ) => Benutzer löschen and uri_escape( Benutzer löschen ) => Benutzer%20l%F6schen uri_escape( Benutzer löschen ) => Benutzer%20l%26ouml%3Bschen` [download] It looks correct when you print it out to the browser, but if you view the source of the HTML you will see the problem. One way to solve it is to html decode the value before making the comparison: `use URI::Escape; use HTML::Entities; my $db_value = uri_unescape( $aText{'1530'} ); decode_entities( $db_value ); if ( $p_sAction eq $db_value ) { ...` [download] In the future, when you are saving the values to the database, you could remove the entities before uri_escape. Update: minor edits.	[reply] [d/l] [select]
Re^2: Decode umlauts on CGI-parameters by Yaerox (Scribe) on Jul 17, 2015 at 08:23 UTC
I need some time to finish what I'm actual doing here, then I'll take a look again on this. Thanks for the reply, I'll update this later. Update: Thank yo uvery much sir, this fixed it for me. I need to see if I can edit this data-storage after finishing this profect for the first. `my $p_sAction = $oCGI->param( "action" ); if ( !defined( $p_sAction ) ){ #do stuff } $p_sAction = decode( 'UTF-8', $p_sAction ); $aText{'1510'} = decode_entities($aText{'1510'}); if ( $p_sAction eq $aText{'1460'} ){ #do stuff - NOW IT MATCHES correctly }` [download]	[reply] [d/l]