| [reply] |
I've tried URI::Encode. It does not work for Extended Characters.
I need something that will work for Extended Characters.
Any suggestions?
| [reply] |
The article I linked to shows how several modules handles non-ASCII characters (which I believe are what you call "extended", though I have no idea why). Is there any reason you ignored it, or haven't read it thoroughly?
| [reply] |
use CGI;
If you then use the param method, all the escaping will be done for you. | [reply] [d/l] [select] |
CGI removes the URL-encoding, but IIRC, CGI leaves the character encoding in place. If so, you can use Encode's decode_utf8 or utf8's decode on what param returns.
| [reply] [d/l] [select] |
CGI has the -utf8 pragma:
This makes CGI.pm treat all parameters as UTF-8 strings. Use this with care, as it will interfere with the processing of binary uploads. It is better to manually select which fields are expected to return utf-8 strings and convert them using code like this:
use Encode;
my $arg = decode utf8=>param('foo');
The problem with binary (file) uploads is due to CGI's legacy, as it partially treats file uploads as form parameters instead of keeping both separate.
Alexander
--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
| [reply] [d/l] [select] |
Actually, your problem has 2 sides. First is simple, the non-ASCII bytes from input form are encoded as %XX. This is taken care of either by simple substitute, or by CGI module or whatever.
The second side of the problem is the encoding that was used during input. In other words you have to know the correspondence between sequence of bytes and the characters. Usually this information is available from the headers. When you find this information, then
interpreting sequence of bytes into character is the matter of applying appropriate conversion. If the output page uses the same encoding as the input page, then no conversion is needed. If the encodings don't match, then you can use Encode::from_to to convert the input into desired encoding for the output.
| [reply] |