Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^2: utf8 characters in tr/// or s///

by MattLG (Sexton)
on Oct 01, 2008 at 20:35 UTC ( [id://714893]=note: print w/replies, xml ) Need Help??


in reply to Re: utf8 characters in tr/// or s///
in thread utf8 characters in tr/// or s///

Brilliant! You guys RULE!

Thanks.

MattLG

Replies are listed 'Best First'.
Re^3: utf8 characters in tr/// or s///
by MattLG (Sexton) on Oct 04, 2008 at 17:28 UTC

    And one other thing that I'm finding conflicting advice for on the internet is packing the incoming data from CGI into utf8.

    I currently use:

    $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;

    against the strings that come in via the web.

    Now I see that there's a "U" template for unicode. But I'm after UTF8, so that doesn't quite fit, and I don't understand what the pack docs are saying about UTF-8. However, in a couple of places I've searched I've found this:

    $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg; utf8::decode($value);

    which I don't really understand. I'd assumed the "C" would put everything into ASCII/ISO-8859-1 and utf8::decoding that would just produce garbage out of the special characters.

    What would the monks advise?

    Cheers

    MattLG

      If the stuff coming in from your web clients is using the "%XX" notation for utf8 character data, then any "wide" characters (requiring more than one byte in utf8) will require one "%XX" thingie per byte (e.g. a utf8 "ÿ" (U+00FF) would be "%C3%BF").

      If you see that in your input, then pack("C",...) is the right thing as the first step: it creates the appropriate byte sequence for the intended utf8 character. The utf8::decode() step then handles the necessary step of getting perl to acknowledge that the given byte sequence should be treated as a utf8 character.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://714893]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (3)
As of 2024-03-29 04:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found