Re^3: Help with Accented Characters

To get a proper hexdump, use unpack instead of ord:

print join " ", unpack("(H2)*", $s);
[download]

When making that change to your code (in addition to use utf8; — as ysth correctly pointed out already), I'm getting output as I would expect (presuming the source file has been composed with a UTF8 editor).

$ ./669879.pl
The string is: 'Resume'
        52 65 73 75 6d 65

        Uppercase: 'RESUME'
        Lowercase: 'resume'
        length = 6
        bytes = 6

The string is: 'Résumé'
        52 c3 a9 73 75 6d c3 a9

        Uppercase: 'RÉSUMÉ'
        Lowercase: 'résumé'
        length = 6
        bytes = 8
[download]

(I've converted the é/É chars in the output to Isolatin, for the PM web frontend to display them properly... But as the hexdump shows, they're internally encoded as c3 a9 (UTF8))

Comment on Re^3: Help with Accented Characters Select or Download Code

Replies are listed 'Best First'.
Re^4: Help with Accented Characters by Anonymous Monk on Feb 10, 2012 at 15:39 UTC
Encode::encode('UTF-8',uc(Encode::decode('UTF-8', $t)));	[reply]
A reply falls below the community's threshold of quality. You may see it by logging in.