Re: How to remove other language character from a string

You need to useutf8; to tell Perl that your source file is in UTF-8. That way non-ASCII literal strings work the way you want them to.

use strict;
use warnings;
use 5.010;
use utf8;
binmode STDOUT, ':encoding(UTF-8)';

my $str = "ครัวซองเเซนด์วิชไข่ดาว Croissant Egg Sandwich ครัวซองเเซนด์วิชไข่ดาว";
$str =~ s/[^\p{Latin}\p{Common}]//g;
$str =~ s/^\s+|\s+$//g;
say $str;
__END__
Croissant Egg Sandwich

See also: Character Encodings in Perl.

Updated to unlinkify the brackets, and to exclude \p{Common} instead of \s from removal.

Perl 6 - the future is here, just unevenly distributed

Comment on Re: How to remove other language character from a string Select or Download Code

Replies are listed 'Best First'.
Re^2: How to remove other language character from a string by Anonymous Monk on Nov 26, 2012 at 05:36 UTC
Thanks moritz, but when I tried this I got the output like this: α╕äα╕úα╕▒α╕ºα&# +9557;ïα╕¡α╕çα╣Çα╣Çα& +#9557;ïα╕Öα╕öα╣îα╕ºα +╕┤α╕èα╣äα╕éα╣ +êα╕öα╕▓α╕º Croissant Egg Sandwich α╕äα╕úα╕▒α&#9557 +;ºα╕ïα╕¡α╕çα╣Çα&#957 +1;Çα╕ïα╕Öα╕öα╣îα&#95 +57;ºα╕┤α╕èα╣äα╕é&#9 +45;╣êα╕öα╕▓α╕º [download]	[reply] [d/l]
Re^3: How to remove other language character from a string by frozenwithjoy (Priest) on Nov 26, 2012 at 05:51 UTC
That's because it wasn't formatted correctly due to missing code tags (which were presumably left out so that the input text would be shown properly). When I first ran moritz's code, I just got the original string, but when I substituted: `$str =~ s/[^\p{Latin}\s]//g;` for this: `$str =~ s/^\p{Latin}\s//g;` it worked. EDIT: If you have lots of extra spaces in your output, you could run it through `$str =~ s/ {2,}/ /g;`, too. Something to keep in mind is that moritz's approach (as is) will remove punctuation.	[reply] [d/l] [select]
Re^4: How to remove other language character from a string by Anonymous Monk on Nov 26, 2012 at 09:53 UTC
It worked smoothly. Thanks Frozenwithjoy and moritz.	[reply]