Yaerox has asked for the wisdom of the Perl Monks concerning the following question:

I got a website-form:<input type="submit" value="Benutzer löschen" name="action">. For those who don't know, "Benutzer löschen" is the german translation of delete user. This Button is getting shown correctly on my webpage. Webpage uses UTF-8 encoding.

My Perl-scrpt to delete the user works like this:
my $p_sAction = $oCGI->param( "action" ); if ( !defined( $p_sAction ) ){ print $oCGI->redirect( "access-denied.pl?reason=110" ); exit; } else { # do stuff }
No matter if I use uri_unescape or not, the output doesn't change. Same with "use utf8;".

I actual don't print anything with this perlscript, I just did a print Content type and use utf8 to see if the script maybe is using the wrong encoding...this is what the output always looks like: Benutzer löschen.

I also tried to use this <input type="submit" value="Benutzer l&ouml;schen" name="action"> instead of the one I wrote at the beginning. Doesn't make a difference. Then I took a look in Chrome Developer Tool: I just saw, that If I go on the Network-Tab, and I click my script, clicking on preview, I get this: Benutzer löschen.

Now I have no idea how to fix this. Any ideas?

Replies are listed 'Best First'.
Re: Decode umlauts on CGI-parameters
by Corion (Patriarch) on Jul 16, 2015 at 08:24 UTC

    The encoding that gets sent by the browser to the server depends on the encoding of your web page and the browser. Ideally, you have both, a Content-Type: header specifying the character set and a <meta http-equiv="content-type" content="text/html; charset=utf-8"> element in your HTML.

    Note that in my experience, at least Internet Explorer (6) does not send the value of an <input> button but only the name (I think).

      I got <meta charset="UTF-8"> on every element.

      Update: Ahh okay, I thought maybe I should try setting this for my output once too, and if I do use

      print "Content-type: text/html\n\n"; print "<html><head><meta charset=\"UTF-8\"></head>"; print "p_sAction: #" . $p_sAction . "#<br>";

      The output is right. But like I said, this script won't print anything. I think I forgot an important thing: I got some comparism on this script like the following
      if ( $p_sAction eq $aText{'1460'} ){ } else { }

      And this doesn't match. $aText is coming out of my DB where I save data uri-escaped, and return it unescaped.


      Update 2:
      # Benutzer löschen print "#$p_sAction# eq #$aText{'1530'}#<br>"; if ( $p_sAction eq $aText{'1530'} ){ print "HELLO WORLD<br>"; }
      Output: #Benutzer löschen# eq #Benutzer löschen#
Re: Decode umlauts on CGI-parameters
by Anonymous Monk on Jul 16, 2015 at 08:48 UTC
      I did 4 tries

      my $p_sAction = $oCGI->param( "action" ); $p_sAction = decode( 'UTF-8', $p_sAction ); if ( !defined( $p_sAction ) ){ print $oCGI->redirect( "access-denied.pl?reason=110" ); exit; } print "Content-type: text/html\n\n"; print "<html><head><meta charset=\"UTF-8\"></head>"; print "p_sAction: #" . $p_sAction . "#<br>"; print "1510 --- #$p_sAction# eq #$aText{'1510'}#<br><br>";

      Output:
      p_sAction: #Benutzer l&#65533;schen# 1510 --- #Benutzer l&#65533;schen# eq #Benutzer löschen#

      my $p_sAction = $oCGI->param( "action" ); $p_sAction = decode( 'UTF-8', $p_sAction ); if ( !defined( $p_sAction ) ){ print $oCGI->redirect( "access-denied.pl?reason=110" ); exit; } print "Content-type: text/html\n\n"; print "<html><head><meta charset=\"UTF-8\"></head>"; print "p_sAction: #" . $p_sAction . "#<br>"; $aText{'1530'} = decode( 'UTF-8', $aText{'1510'} ); print "1510 --- #$p_sAction# eq #$aText{'1510'}#<br><br>";

      Output:
      p_sAction: #Benutzer l&#65533;schen# 1510 --- #Benutzer l&#65533;schen# eq #Benutzer löschen#

      my $p_sAction = $oCGI->param( "action" ); $p_sAction = decode( 'UTF-8', $p_sAction ); if ( !defined( $p_sAction ) ){ print $oCGI->redirect( "access-denied.pl?reason=110" ); exit; } print "Content-type: text/html\n\n"; print "p_sAction: #" . $p_sAction . "#<br>"; print "1510 --- #$p_sAction# eq #$aText{'1510'}#<br><br>";

      Output:
      p_sAction: #Benutzer löschen# 1530 --- #Benutzer löschen# eq #Benutzer löschen#

      my $p_sAction = $oCGI->param( "action" ); $p_sAction = decode( 'UTF-8', $p_sAction ); if ( !defined( $p_sAction ) ){ print $oCGI->redirect( "access-denied.pl?reason=110" ); exit; } print "Content-type: text/html\n\n"; print "p_sAction: #" . $p_sAction . "#<br>"; $aText{'1530'} = decode( 'UTF-8', $aText{'1510'} ); print "1510 --- #$p_sAction# eq #$aText{'1510'}#<br><br>";

      Output:
      p_sAction: #Benutzer löschen# 1530 --- #Benutzer löschen# eq #Benutzer löschen#


      Still nothing matchs...
      Update: If I add
      if ( $p_sAction eq "Benutzer löschen" ){ print "HELLO WORLD 22222 - Benutzer löschen<br>"; }
      it works ... so it has to be the encoding of $aText{'1510'} I'd say. Doesn't it?
      I store the text in db like this: "Benutzer%20l%26ouml%3Bschen" (uriescaped_utf8), when I receive it, I uriunescape it and then I compare...

      I read on stackoverflow (http://stackoverflow.com/questions/17599103/perl-comparing-2-accentuated-strings-with-different-encodingone-being-read-from) using Unicode::Normalize::NFD could help. For me it just makes it more worse.
Re: Decode umlauts on CGI-parameters
by tangent (Parson) on Jul 16, 2015 at 14:22 UTC
    I store the text in db like this: "Benutzer%20l%26ouml%3Bschen" (uriescaped_utf8), when I receive it, I uriunescape it and then I compare...
    It looks like the value stored in the database contains a HTML Entity, so when you retrieve and uri_unescape it, it is "Benutzer l&ouml;schen", not "Benutzer löschen".
    uri_unescape( Benutzer%20l%26ouml%3Bschen ) => Benutzer l&ouml;schen and uri_escape( Benutzer löschen ) => Benutzer%20l%F6schen uri_escape( Benutzer l&ouml;schen ) => Benutzer%20l%26ouml%3Bschen
    It looks correct when you print it out to the browser, but if you view the source of the HTML you will see the problem.

    One way to solve it is to html decode the value before making the comparison:

    use URI::Escape; use HTML::Entities; my $db_value = uri_unescape( $aText{'1530'} ); decode_entities( $db_value ); if ( $p_sAction eq $db_value ) { ...
    In the future, when you are saving the values to the database, you could remove the entities before uri_escape.

    Update: minor edits.

      I need some time to finish what I'm actual doing here, then I'll take a look again on this. Thanks for the reply, I'll update this later.
      Update: Thank yo uvery much sir, this fixed it for me. I need to see if I can edit this data-storage after finishing this profect for the first.
      my $p_sAction = $oCGI->param( "action" ); if ( !defined( $p_sAction ) ){ #do stuff } $p_sAction = decode( 'UTF-8', $p_sAction ); $aText{'1510'} = decode_entities($aText{'1510'}); if ( $p_sAction eq $aText{'1460'} ){ #do stuff - NOW IT MATCHES correctly }