I am a Perl newbie but a long-time gawk scripter who now needs to deal with text containing Unicode characters not in the base plane.
Specifically, I need to be able to globally substitute the Unicode left arrow character (U+2190) for a code point in SUPPLMENTARY PRIVATE USE AREA-B, Plane 16, U+100049, while processing some input text.
Initially I used the a2p utility to convert a working gawk script, which uses the hex byte equivalents of Unicode characters to accomplish the substitution with the gawk gsub function. The perl generated by a2p for the gsub function is a little hard to comprehend, so I tried to test how substitution should work with a much simpler perl script:use strict; use warnings; use utf8; use feature 'unicode_strings'; my $txt; my $tx1; my $s_; my $TestCh1; my $TestCh2; binmode STDOUT, ':encoding(UTF-8)'; printf "\x{FEFF}"; # $txt = "This =>\N{U+100049}<= is a Unicode character in Plane 16"; $txt = "This =>􀁉<= is a Unicode character in Plane 16"; $tx1 = $txt; $tx1 =~ s/"\\N{U+100049}"/"\N{U+2190}"/ge; print "0:\$txt=" . $tx1; print "\n\n"; $tx1 = $txt; $tx1 =~ s/\\xF4\\x80\\x81\\x89/"\\N{U+2190}"/ge; print "1:\$txt=" . $tx1; print "\n\n"; $tx1 = $txt; $TestCh1 = "\\xF4\\x80\\x81\\x89"; $TestCh2 = "\\N{U+2190}"; ($s_ = '"'.($TestCh2).'"') =~ s/&/\$&/g; print "2:\$s_=" . $s_ . "!, \$TestCh1=" . $TestCh1 . "!, \$TestCh2=" . + $TestCh2 . "!\n"; $tx1 =~ s/$TestCh1/eval $s_/ge; print "2:\$tx1=" . $tx1 . "!\n"; print "\n"; $tx1 = $txt; $TestCh2 = "\\xE2\\x86\\x90"; ($s_ = '"'.($TestCh2).'"') =~ s/&/\$&/g; print "3:\$s_=" . $s_ . "!, \$TestCh1=" . $TestCh1 . "!, \$TestCh2=" . + $TestCh2 . "!\n"; $tx1 =~ s/$TestCh1/eval $s_/ge; print "3:\$tx1=" . $tx1 . "!\n"; print "\n";
However, none of these techniques seems to be working for me. The third and fourth techniques in the above program are using copies of the code generated by a2p for the gawk gsub function, so I thought they would work even if I did not understand the details of why, but they do not work. The U+1000049 character is never changed to U+2190.
Would you please enlighten me about how I should code this function in Perl? Also, a step-by-step explanation of the Perl code generated by a2p for the gawk gsub function would be much appreciated as a teaching tool to help me learn perl.
My environment is Win7-64, Strawberry Perl 5.18.2, if that makes a difference.
TIA for any help you can provide to cure my ignorance.
Peter
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |