I think your problem is that it is unclear which encodings your strings have in

In the end, everything is octets, but Perl regular expressions treat a string only as Unicode if it has been properly decoded.

The main goal to achieve is consistency, and the ideal goal is to Encode::decode the data when you read it (from a file, from the database, ...) and Encode::encode it to UTF-8 when you write it to HTML.

On the way there, you should inspect the octets of the string, for example using Data::Dumper or Data::Dump to see what octets are in the string and also what Perl thinks the string contains. Ideally, Perl should report it sees \x{200b} in the string. If it reports the three bytes \xE2\x80\x8B you have the right data, but Perl does not know that the string should be seen as Unicode. You then should decode it from UTF-8.

You should do this inspection for every step of the pipeline.


In reply to Re: Remove u200b unicode From String by Corion
in thread Remove u200b unicode From String by phildeman

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.