Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Normally one shall never worry about joining strings together in perl. Simple "a" . "b" shall work. If you have problem with that, then most likely you don't understand how things work. Try to read perldoc Encode carefully.

Just in case, here is simplistic description. The applications in computer exchange data as bytes, or octets. "Octets" are not the same as "characters" that humans read. One character can be represented by multiple octets. If your program does not care about characters (it does not try to make them upper or lower case, it does not split on characters etc.) then your program may simply take data in or give data out without worrying about UTF, Unicode or whatever. But usually one has to manipulate characters, that's where confusion starts.

First of all, you have to worry about representation of characters in the octets that you receive from external applications. That depends on locale settings, but most of modern unixes provide characters encoded as UTF-8. After you receive data from outside, you have to tell perl the encoding of the data, so that perl can split that data on characters. This is done either by using Encode::decode directly, or by adjusting input stream so, that it does this operation for you (by using binmode for example). After this, perl is ready to view your data as characters instead of octets.

Of course you also have to worry about strings that you type directly into perl code. Perl has to know about their encoding as well. If your editor by default saves all data in UTF-8, then you can put into code "use utf8;" so that perl automatically calls Encode::decode on all your quoted strings and patterns. Or again, without "use utf8;" you can call Encode::decode directly.

The 2 steps above ensure that perl knows how to split your strings into characters. But if you want to output your character strings to the outside world, you have to do the reverse conversion from "characters string" to "octets string". Again, to do that, you can either call Encode::encode directly, or configure your output stream so that it does it for you automatically.

If all the steps are handled correctly, then you never have to worry about strings concatenation.


In reply to Re: How to concatenate utf8 safely? by andal
in thread How to concatenate utf8 safely? by gregor42

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (7)
As of 2024-03-28 16:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found