comment on

Huh? In your snippet, perl is "just printing whatever gets thrown at it", without doing any sort of "translation" on it.

It's trying and failing to convert Unicode code point 0x263a to Latin-1. We see the warning because it's impossible to translate a code point that high to Latin 1.

I thought the example I gave was the easiest to grok, but this is probably better, because the output is actually different.

#!/usr/bin/perl
use strict;
use warnings;

use Encode qw( _utf8_on );

my $resume = "r\xc3\xa9sum\xc3\xa9";
print $resume, "\n";

_utf8_on($resume);

print $resume, "\n";
[download]

Conceptually, appending a non-UTF8 string to a UTF8 string is a really bad idea, bordering on stupid. Don't do that. (Why would you want to? What would you hope to accomplish as a result?)

I'd like to spit out scalars flagged as UTF8 by default from KinoSearch. But if I do that, that means anybody who gets that output is going to have to know how to deal with them. I don't want to spend all my time explaining the bottomless intricacies of Unicode handling in Perl to people. It's not that I want to be doing a lot of this concatenation, it's that I know it's going to happen some of the time and I don't want the support burden.

--
Marvin Humphrey
Rectangular Research ― http://www.rectangular.com

In reply to Re^2: Interventionist Unicode Behaviors by creamygoodness
in thread Interventionist Unicode Behaviors by creamygoodness

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.