in reply to Re: Global substitution of non-base-plane Unicode characters
in thread Global substitution of non-base-plane Unicode characters
Thanks for the reference to Data::Dumper, very useful.
Yes, I am sure that I am not getting the translation in my code since I did follow the "Basic debugging checklist" item 2, using print and printf to dump the results of the substitutions and the intermediate variable $s_ after each attempt. Using your suggested Data:Dumper code I also confirmed the print* results. Here is my code updated with your Data::Dumper suggestion and a few additional printf's, followed by the output results:
use strict; use warnings; use utf8; use feature 'unicode_strings'; use Data::Dump qw/ dd /; my $txt; my $tx1; my $s_; my $TestCh1; my $TestCh2; binmode STDOUT, ':encoding(UTF-8)'; printf "\x{FEFF}"; # $txt = "This =>\N{U+100049}<= is a Unicode character in Plane 16"; $txt = "This =>􀁉<= is a Unicode character in Plane 16"; printf "Dumping \$txt="; dd( $txt ); $tx1 = $txt; $tx1 =~ s/"\\N{U+100049}"/"\N{U+2190}"/ge; print "0:\$txt=" . $tx1; print "\n"; printf "Dumping \$tx1="; dd( $tx1 ); print "\n"; $tx1 = $txt; $tx1 =~ s/\\xF4\\x80\\x81\\x89/"\\N{U+2190}"/ge; print "1:\$txt=" . $tx1; print "\n"; printf "Dumping \$tx1="; dd( $tx1 ); print "\n"; $tx1 = $txt; $TestCh1 = "\\xF4\\x80\\x81\\x89"; $TestCh2 = "\\N{U+2190}"; ($s_ = '"'.($TestCh2).'"') =~ s/&/\$&/g; print "2:\$s_=" . $s_ . "!, \$TestCh1=" . $TestCh1 . "!, \$TestCh2=" . + $TestCh2 . "!\n"; $tx1 =~ s/$TestCh1/eval $s_/ge; print "2:\$tx1=" . $tx1 . "!\n"; print "\n"; printf "Dumping \$tx1="; dd( $tx1 ); print "\n"; $tx1 = $txt; $TestCh2 = "\\xE2\\x86\\x90"; ($s_ = '"'.($TestCh2).'"') =~ s/&/\$&/g; print "3:\$s_=" . $s_ . "!, \$TestCh1=" . $TestCh1 . "!, \$TestCh2=" . + $TestCh2 . "!\n"; $tx1 =~ s/$TestCh1/eval $s_/ge; print "3:\$tx1=" . $tx1 . "!"; print "\n"; printf "Dumping \$tx1="; dd( $tx1 ); print "\n"; __END__ Dumping $txt="This =>\x{100049}<= is a Unicode character in Plane 16" 0:$txt=This =>􀁉<= is a Unicode character in Plane 16 Dumping $tx1="This =>\x{100049}<= is a Unicode character in Plane 16" 1:$txt=This =>􀁉<= is a Unicode character in Plane 16 Dumping $tx1="This =>\x{100049}<= is a Unicode character in Plane 16" 2:$s_="\N{U+2190}"!, $TestCh1=\xF4\x80\x81\x89!, $TestCh2=\N{U+2190}! 2:$tx1=This =>􀁉<= is a Unicode character in Plane 16! Dumping $tx1="This =>\x{100049}<= is a Unicode character in Plane 16" 3:$s_="\xE2\x86\x90"!, $TestCh1=\xF4\x80\x81\x89!, $TestCh2=\xE2\x86\x +90! 3:$tx1=This =>􀁉<= is a Unicode character in Plane 16! Dumping $tx1="This =>\x{100049}<= is a Unicode character in Plane 16"
Your use of curly braces as quoting characters in your example is a little bit confusing to me. Won't simple slashes do just as well in your example? Or is there some other subtle reason to use the braces instead?
What's confusing to me in the a2p translation of the gsub function is that in the substitution expression the replacement text variable $s_ is preceded by an "eval" and the suffix modifier "e" is also used which is supposed to do an "eval" on the replacement expression. Does that mean that the variable $s_ is "eval"ed twice? And if that is true, why is it done that way?
Thanks again for your help.
Peter
|
|---|