cosmicperl has asked for the wisdom of the Perl Monks concerning the following question:
If I print the input straight back out it comes out as a normal £ as expected, if decoded it gets an unrecognised character symbol, encoded it has the tell tale  appear. But if I pass it through HTML::Entities, the input get's the  and the decoded one comes out right?? The encoded one, well that comes out even wierder.Input: £ (IS UTF8? No) Decoded: ? (IS UTF8? Yes) Encoded: £ (IS UTF8? No) Entities input: £ Entities decoded: £ Entities encoded: £
Which didn't make sense to me, I expected the decoded one to be just £. But when I tested this script on Win32 IIS, i got:-Input: £ Decoded: £ Encoded: £
Which is what I expected???Input: £ Decoded: £ Encoded: £
#!/usr/bin/perl use strict; BEGIN { print "content-type: text/html; charset=UTF-8\n\n"; use FindBin qw ($RealBin $RealScript); use lib $FindBin::RealBin; chdir $RealBin; }#BEGIN use CGI; my $cgi = new CGI; print qq~ <form method=POST> input: <input type=text name=string value="${ \$cgi->param('string') } +"> <input type=submit> </form> ~; if ( $cgi->param('string') ) { use Encode qw( is_utf8 encode decode ); print "Input: ${ \$cgi->param('string') } (IS UTF8? "; if ( is_utf8($cgi->param('string')) ) { print "Yes)<br>\n"; } else { print "No)<br>\n"; } my $string = decode("utf8", $cgi->param('string')); print "Decoded: $string (IS UTF8? "; if ( is_utf8($string) ) { print "Yes)<br>\n"; } else { print "No)<br>\n"; } my $octets = encode("utf8", $cgi->param('string')); print "Encoded: $octets (IS UTF8? "; if ( is_utf8($octets) ) { print "Yes)<br>\n"; } else { print "No)<br>\n"; } open( OUTF, '>utf8.txt' ) || print("Error writing file"); print OUTF "Input: ${ \$cgi->param('string') }\n"; print OUTF "Decoded: $string\n"; print OUTF "Encoded: $octets\n"; close( OUTF ); use HTML::Entities; my $ent_input = encode_entities($cgi->param('string')); print "Entities input: $ent_input<br>\n"; my $ent_decode = encode_entities($string); print "Entities decoded: $ent_decode<br>\n"; my $ent_encode = encode_entities($octets); print "Entities encoded: $ent_encode<br>\n"; }#if
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: UTF-8: Trying to make sense of form input
by ikegami (Patriarch) on Aug 16, 2009 at 01:31 UTC | |
by creamygoodness (Curate) on Aug 16, 2009 at 03:58 UTC | |
by ikegami (Patriarch) on Aug 16, 2009 at 04:25 UTC | |
by creamygoodness (Curate) on Aug 16, 2009 at 05:41 UTC | |
Re: UTF-8: Trying to make sense of form input
by graff (Chancellor) on Aug 16, 2009 at 02:42 UTC | |
Re: UTF-8: Trying to make sense of form input
by Anonymous Monk on Aug 16, 2009 at 00:51 UTC | |
Re: UTF-8: Trying to make sense of form input
by Nigel Peck (Initiate) on Sep 17, 2009 at 21:19 UTC | |
by ikegami (Patriarch) on Sep 17, 2009 at 21:35 UTC |