in reply to How to remove other language character from a string

You need to useutf8; to tell Perl that your source file is in UTF-8. That way non-ASCII literal strings work the way you want them to.

use strict;
use warnings;
use 5.010;
use utf8;
binmode STDOUT, ':encoding(UTF-8)';

my $str = "ครัวซองเเซนด์วิชไข่ดาว Croissant Egg Sandwich ครัวซองเเซนด์วิชไข่ดาว";
$str =~ s/[^\p{Latin}\p{Common}]//g;
$str =~ s/^\s+|\s+$//g;
say $str;
__END__
Croissant Egg Sandwich

See also: Character Encodings in Perl.

Updated to unlinkify the brackets, and to exclude \p{Common} instead of \s from removal.

Replies are listed 'Best First'.
Re^2: How to remove other language character from a string
by Anonymous Monk on Nov 26, 2012 at 05:36 UTC
    Thanks moritz, but when I tried this I got the output like this:
    α╕äα╕úα╕▒α╕ºα&# +9557;ïα╕¡α╕çα╣Çα╣Çα& +#9557;ïα╕Öα╕öα╣îα╕ºα +╕┤α╕èα╣äα╕éα╣ +êα╕öα╕▓α╕º Croissant Egg Sandwich α╕äα╕úα╕▒α&#9557 +;ºα╕ïα╕¡α╕çα╣Çα&#957 +1;Çα╕ïα╕Öα╕öα╣îα&#95 +57;ºα╕┤α╕èα╣äα╕é&#9 +45;╣êα╕öα╕▓α╕º

      That's because it wasn't formatted correctly due to missing code tags (which were presumably left out so that the input text would be shown properly). When I first ran moritz's code, I just got the original string, but when I substituted:

      $str =~ s/[^\p{Latin}\s]//g;

      for this:

      $str =~ s/^\p{Latin}\s//g;

      it worked.

      EDIT: If you have lots of extra spaces in your output, you could run it through $str =~ s/ {2,}/ /g;, too. Something to keep in mind is that moritz's approach (as is) will remove punctuation.

        It worked smoothly. Thanks Frozenwithjoy and moritz.