in reply to How can I safely unescape a string.

use 5.010; use utf8::all; my $string = q[\334ber@n\374ber.com]; $string =~ s/\\([0-7]+)/chr oct $1/eg; say $string;
perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

Replies are listed 'Best First'.
Re^2: How can I safely unescape a string.
by davido (Cardinal) on Sep 12, 2012 at 20:56 UTC

    A big ++ from me for presenting the solution I was about to post but with the utf8::all pragma instead of "binmode STDOUT, ':encoding(utf8)'; that my solution would have had. ...because it prompted me to look into this new pragma I hadn't heard about or seen used before. Nice job!

    Makes me wonder what our resident Unicode expert would have to say about it. There must be a list of gotchas.


    Dave

      Hello Dave

      it seems 0x80-0xFF characters with chr() still have to be upgraded even if utf8::all. Below code shows character with chr() in 0x80-0xFF range doesn't have UTF-8 flag.

      use strict; use warnings; my $str='\334ber@n\374ber.com'; $str =~ s/\\([0-7]+)/pack('U', oct('0'.$1))/eg; binmode STDOUT, ":encoding(UTF-8)"; print "$str\n"; print utf8::is_utf8($str) ? "str is ...utf8\n" : "str is ...not utf8\n +"; use 5.010; use utf8::all; my $string='\334ber@n\374ber.com'; $string =~ s/\\([0-7]+)/chr oct $1/eg; print "$string\n"; print utf8::is_utf8($string) ? "string is ...utf8\n" : "string is ...n +ot utf8\n"; my $string='\334ber@n\374ber.com'; $string =~ s/\\([0-7]+)/chr oct $1/eg; utf8::upgrade($string); print "$string\n"; print utf8::is_utf8($string) ? "string is ...utf8\n" : "string is ...n +ot utf8\n";
      I think utf8::all is utf8::almost.