comment on

Normally one shall never worry about joining strings together in perl. Simple "a" . "b" shall work. If you have problem with that, then most likely you don't understand how things work. Try to read perldoc Encode carefully.

Just in case, here is simplistic description. The applications in computer exchange data as bytes, or octets. "Octets" are not the same as "characters" that humans read. One character can be represented by multiple octets. If your program does not care about characters (it does not try to make them upper or lower case, it does not split on characters etc.) then your program may simply take data in or give data out without worrying about UTF, Unicode or whatever. But usually one has to manipulate characters, that's where confusion starts.

First of all, you have to worry about representation of characters in the octets that you receive from external applications. That depends on locale settings, but most of modern unixes provide characters encoded as UTF-8. After you receive data from outside, you have to tell perl the encoding of the data, so that perl can split that data on characters. This is done either by using Encode::decode directly, or by adjusting input stream so, that it does this operation for you (by using binmode for example). After this, perl is ready to view your data as characters instead of octets.

Of course you also have to worry about strings that you type directly into perl code. Perl has to know about their encoding as well. If your editor by default saves all data in UTF-8, then you can put into code "use utf8;" so that perl automatically calls Encode::decode on all your quoted strings and patterns. Or again, without "use utf8;" you can call Encode::decode directly.

The 2 steps above ensure that perl knows how to split your strings into characters. But if you want to output your character strings to the outside world, you have to do the reverse conversion from "characters string" to "octets string". Again, to do that, you can either call Encode::encode directly, or configure your output stream so that it does it for you automatically.

If all the steps are handled correctly, then you never have to worry about strings concatenation.

In reply to Re: How to concatenate utf8 safely? by andal
in thread How to concatenate utf8 safely? by gregor42

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.