aceofspace has asked for the wisdom of the Perl Monks concerning the following question:

I've a web page form to accept input from visitors.

I'm wondering if there is a perl module or convienient sub-rountine out there that will help my perl script converts Extended Character inputs into same Extended Character outputs and prints it on a web page.

The following will illustrate my point:

Let's say a visitor input an extended character, £ in the form. When my perl script gets this variable, £ becomes %C2%A3. I'm looking for a Perl Module or ready-made sub-routine that will help my script converts %C2%A3 back to £ and prints it out on a web page.

Any advice?

Replies are listed 'Best First'.
Re: Conversion of Extended Characters
by moritz (Cardinal) on Dec 28, 2010 at 16:44 UTC
      I've tried URI::Encode. It does not work for Extended Characters.

      I need something that will work for Extended Characters.

      Any suggestions?

        The article I linked to shows how several modules handles non-ASCII characters (which I believe are what you call "extended", though I have no idea why). Is there any reason you ignored it, or haven't read it thoroughly?

Re: Conversion of Extended Characters
by JavaFan (Canon) on Dec 28, 2010 at 16:38 UTC
    use CGI;
    If you then use the param method, all the escaping will be done for you.

      CGI removes the URL-encoding, but IIRC, CGI leaves the character encoding in place. If so, you can use Encode's decode_utf8 or utf8's decode on what param returns.

        CGI has the -utf8 pragma:

        This makes CGI.pm treat all parameters as UTF-8 strings. Use this with care, as it will interfere with the processing of binary uploads. It is better to manually select which fields are expected to return utf-8 strings and convert them using code like this:
        use Encode; my $arg = decode utf8=>param('foo');

        The problem with binary (file) uploads is due to CGI's legacy, as it partially treats file uploads as form parameters instead of keeping both separate.

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: Conversion of Extended Characters
by andal (Hermit) on Dec 29, 2010 at 09:40 UTC

    Actually, your problem has 2 sides. First is simple, the non-ASCII bytes from input form are encoded as %XX. This is taken care of either by simple substitute, or by CGI module or whatever.

    The second side of the problem is the encoding that was used during input. In other words you have to know the correspondence between sequence of bytes and the characters. Usually this information is available from the headers. When you find this information, then interpreting sequence of bytes into character is the matter of applying appropriate conversion. If the output page uses the same encoding as the input page, then no conversion is needed. If the encodings don't match, then you can use Encode::from_to to convert the input into desired encoding for the output.