andal : " ... First of all, you have to worry about representation of characters in the octets that you receive from external applications. That depends on locale settings ... "

OP " ... I assume that the problem is my code and not the data coming in since one can usually depend on people to get their own names right ... "

It would appear that my initial assumption was incorrect. I challenged that & as it turns out what I am dealing with is a mixture of localized character sets taken as input from across Europe, cut & pasted between spreadsheets in an HR department spanning multiple offices.

(ノωノ)

These are conscientious people, mind you, who are concerned about getting the characters just right by potentially editing with multiple programs along the way...

I'm glad that I asked and should have done so sooner.

Now, in the proper mindset and having done my revision a big thing I was missing was that I was using :utf8 instead of :encoding(utf8) which allowed me to regain the trust factor in the data.

I had all kinds of stupid ideas and bad assumptions that led me to chase phantoms. Now at least I can identify mangled input on the way in.



Wait! This isn't a Parachute, this is a Backpack!

In reply to Re: How to concatenate utf8 safely? by gregor42
in thread How to concatenate utf8 safely? by gregor42

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.