For starters, U+00A0 is not a zero-width space; it's a (normal-width) non-breaking space.

Furthermore, as a normal-width space, it isn't a non-printing character. That is to say, it is printing character.

On to your question. To remove NBSP and non-printing characters, you can use the following:

s/[\N{NBSP}\P{Print}]//g

(In lieu of \N{NBSP}, once can use \xA0 or \x{A0} or \N{U+A0} or ...)

The above expects Unicode characters (decoded text). You are providing encoded text instead (bytes). You need to properly decode your inputs and encode your outputs.

For example, if your source code is encoded using UTF-8 rather than ASCII, you want:

use utf8;

For example, the following causes STDIN, STDOUT and STDERR to be decoded/encoded automatically, and it sets the default encoding for files opened in scope:

use open ':std', ':encoding(UTF-8)';

Failing to properly decode your inputs and encode your outputs explains the results you are seeing.


In reply to Re: Safely removing Unicode zero-width spaces and other non-printing characters by ikegami
in thread Safely removing Unicode zero-width spaces and other non-printing characters by mldvx4

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.