in reply to Escaping Wide Characters

That code (unpack 'W' specifically) only works in 5.10. Below is a version that works in 5.8+ (when unicode support was added), and a reverse function for 5.8+ that safe for use on untrusted strings.
sub escape_5_10 { join '', map { $_ > 255 ? sprintf('\\x{%04X}', $_) : chr() =~ /[[:cntrl:]]/ ? sprintf('\\x%02X', $_) : quotemeta(chr()) } unpack('W*', @_ ? $_[0] : $_) } sub escape { join '', map { ord() > 255 ? sprintf('\\x{%04X}', ord()) : /[[:cntrl:]]/ ? sprintf('\\x%02X', ord()) : quotemeta() } map /./gs, @_ ? $_[0] : $_ } sub unescape { my $s = @_ ? $_[0] : $_; $s =~ s/ \G (?: \\x \{ ([0-9a-fA-F]+) \} | \\x ([0-9a-fA-F]{1,2}) | \\(.) | # No escapes ) ([^\\]*) / ( defined($1) ? chr(hex($1)) : defined($2) ? chr(hex($2)) : defined($3) ? $3 : '' ) . $4 /xesg; $s } # XXX Assumes. Good enough. Avoids warn. binmode STDOUT, ':encoding(UTF-8)'; my $s = '<3' # \W and \w . chr(0x04D2) # wide char . "\cC"; # ctrl char print("$s\n"); $s = escape($s); print("$s\n"); $s = unescape($s); print("$s\n");

Update: Functions now default to using $_ is no arg was supplied.

Replies are listed 'Best First'.
Re^2: Escaping Wide Characters
by almut (Canon) on Mar 05, 2008 at 18:30 UTC
    That code (unpack 'W' specifically) only works in 5.10.

    The 5.8 docs had unpack("U*", ...) in the nice_string() snippet (instead of unpack("W*", ...) ) — which works fine, AFAICT.

      I have some weird problems with these mappings: On an English Windows XP, the unpack("U*"...) works fine, even with my Perl 5.6.1.

      But neither the unpack("U*"...) nor the "map /./gs,..." approach works if I run exactly the same scripts on an English Windows 2003 Server platform, or on a Japanese Windows XP platform. Do you know of any general problems with perl's Unicode handling on these platforms, and do you have an idea how I could solve that?

        Sounds like you're starting with an undecoded string? The function is *not* platform dependent.
Re^2: Escaping Wide Characters
by mobiusinversion (Beadle) on Mar 05, 2008 at 18:15 UTC
    all groovy.

    that was awesome, thanks!