in reply to Re^2: Never touch (or look at) the UTF8 flag!!
in thread Interventionist Unicode Behaviors

If this piece of code was in a module, and the main script disagreed about the encoding, things would have broken. For example, if "use encoding 'iso-8859-1';" was used: you'd get a UTF8 string of three characters, but six bytes.
That's why you should never use \x for literal bytes. Instead, use pack with a "C*" template. Only if nothing in your script uses the new stuff, you can be sure you get the old stuff.
This is really a side issue, because as I've stressed, the hex notation was a means to an end. All I wanted was a scalar with a particular sequence of bytes in the PV, and I'd have been just as happy to have gotten it with pack, as you advocate.

Nevertheless, I have not yet found a way to make the interpolated backslash-x notation misbehave as you suggest it should. Can you please indicate how to modify this code sample so that it illustrates your assertion?

slothbear:~/perltest marvin$ cat BackslashX.pm package BackslashX; use strict; use warnings; use Encode '_utf8_on'; our $smiley = "\xE2\x98\xBA"; _utf8_on($smiley); 1; slothbear:~/perltest marvin$ cat backslash_x.plx #!/usr/bin/perl use strict; use warnings; use encoding 'iso-8859-1'; use BackslashX; use Devel::Peek; Dump($BackslashX::smiley); print $BackslashX::smiley; print "\n"; slothbear:~/perltest marvin$ perl backslash_x.plx SV = PV(0x1834224) at 0x181ed98 REFCNT = 1 FLAGS = (POK,pPOK,UTF8) PV = 0x372010 "\342\230\272"\0 [UTF8 "\x{263a}"] CUR = 3 LEN = 4 "\x{263a}" does not map to iso-8859-1 at backslash_x.plx line 11. \x{263a}
That's clearly broken, but only because Unicode code point 0x263a doesn't map to Latin-1. How do I get the 6-byte combo?
For someone who's sufficiently skilled in Perl, unicode, and the combination of both, you managed to appear quite clueless in the OP. But now I wonder if you were actually serious (if so, please rephrase your question, this time based on the way you SHOULD use things), or just trolling.

Trolling? On the contrary: I'm doing my best to keep this discussion low-key despite some rather provocative remarks about my competence that have gone by. :)

--
Marvin Humphrey
Rectangular Research ― http://www.rectangular.com

Replies are listed 'Best First'.
Re^4: Never touch (or look at) the UTF8 flag!!
by Juerd (Abbot) on Sep 11, 2006 at 22:11 UTC

    That's clearly broken, but only because Unicode code point 0x263a doesn't map to Latin-1. How do I get the 6-byte combo?

    Interesting. That is indeed very much broken.

    Juerd # { site => 'juerd.nl', do_not_use => 'spamtrap', perl6_server => 'feather' }