Re: Removing multibyte UTF-8 chars from strings

You don't show us where the string is initialized.

If you have the string verbatim in your editor, you might want to save the file with the UTF-8 encoding and then use utf8; at the top. Personally, I prefer to use charnames ':full'; and then write the characters using \N{...} named escapes.

As for the replacement target, you also need to tell/show us where you get it from, and you need to tell Perl what encoding the string is in. Maybe/most likely, the string already is UTF-8 but Perl doesn't know it. Then you should tell it to Perl by using:

use Encode 'decode';

...

my $string = decode('UTF-8', $input_string);

# Keep only what we want:
$string =~ m!([a-zA-Z0-9]+)!
    or warn "Invalid/empty username in '$string'";
my $real_user = $1;

# Remove stuff we don't want, especially the writing direction isolate
+s:
$string =~ s!\x{2066}|\x{2069}!!g;
[download]

Comment on Re: Removing multibyte UTF-8 chars from strings Select or Download Code

Replies are listed 'Best First'.
Re^2: Removing multibyte UTF-8 chars from strings by cormanaz (Deacon) on Jan 10, 2022 at 19:27 UTC
Ya sorry, I was reading from a file and clipped the offending chars from that. The closing regex did the trick. Never heard of a "direction isolate."	[reply]