comment on

If this piece of code was in a module, and the main script disagreed about the encoding, things would have broken. For example, if "use encoding 'iso-8859-1';" was used: you'd get a UTF8 string of three characters, but six bytes.

That's why you should never use \x for literal bytes. Instead, use pack with a "C*" template. Only if nothing in your script uses the new stuff, you can be sure you get the old stuff.

This is really a side issue, because as I've stressed, the hex notation was a means to an end. All I wanted was a scalar with a particular sequence of bytes in the PV, and I'd have been just as happy to have gotten it with pack, as you advocate.

Nevertheless, I have not yet found a way to make the interpolated backslash-x notation misbehave as you suggest it should. Can you please indicate how to modify this code sample so that it illustrates your assertion?

slothbear:~/perltest marvin$ cat BackslashX.pm 
package BackslashX;
use strict;
use warnings;

use Encode '_utf8_on';

our $smiley = "\xE2\x98\xBA";
_utf8_on($smiley);

1;

slothbear:~/perltest marvin$ cat backslash_x.plx 
#!/usr/bin/perl
use strict;
use warnings;

use encoding 'iso-8859-1';
use BackslashX;
use Devel::Peek;

Dump($BackslashX::smiley);
print $BackslashX::smiley;
print "\n";

slothbear:~/perltest marvin$ perl backslash_x.plx 
SV = PV(0x1834224) at 0x181ed98
  REFCNT = 1
  FLAGS = (POK,pPOK,UTF8)
  PV = 0x372010 "\342\230\272"\0 [UTF8 "\x{263a}"]
  CUR = 3
  LEN = 4
"\x{263a}" does not map to iso-8859-1 at backslash_x.plx line 11.
\x{263a}
[download]

That's clearly broken, but only because Unicode code point 0x263a doesn't map to Latin-1. How do I get the 6-byte combo?

For someone who's sufficiently skilled in Perl, unicode, and the combination of both, you managed to appear quite clueless in the OP. But now I wonder if you were actually serious (if so, please rephrase your question, this time based on the way you SHOULD use things), or just trolling.

Trolling? On the contrary: I'm doing my best to keep this discussion low-key despite some rather provocative remarks about my competence that have gone by. :)

--
Marvin Humphrey
Rectangular Research ― http://www.rectangular.com

In reply to Re^3: Never touch (or look at) the UTF8 flag!! by creamygoodness
in thread Interventionist Unicode Behaviors by creamygoodness

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.