in reply to Match full utf-8 characters
echo -n 'a…b' | perl -pe 's@(.).(.)@$1$2@';
echo may generate a byte stream representing three Unicode characters, but Perl reads it as byte stream, not as Unicode characters. So you are cutting out a byte, not a character, and get back garbage. Also, perl writes out bytes, not Unicode characters.
Tell perl to treat STDIN and STDOUT as Unicode character streams and everything works as expected:
>echo -n 'a…b' | perl -pe 's@(.).(.)@$1$2@' a▒▒b >echo -n 'a…b' | perl -CIO -pe 's@(.).(.)@$1$2@' ab >perl -v This is perl 5, version 22, subversion 2 (v5.22.2) built for x86_64-linux-thread-multi Copyright 1987-2015, Larry Wall Perl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the Perl 5 source kit. Complete documentation for Perl, including FAQ lists, should be found on this system using "man perl" or "perldoc perl". If you have access to the Internet, point your browser at http://www.perl.org/, the Perl Home Page. >
See also -C in perlrun, and the thread any use of 'use locale'?, especially the subthread Re^3: any use of 'use locale'? (source encoding).
Alexander
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Match full utf-8 characters
by Allasso (Monk) on Apr 29, 2019 at 13:24 UTC | |
by Allasso (Monk) on Apr 29, 2019 at 13:36 UTC | |
by Anonymous Monk on Apr 29, 2019 at 13:55 UTC | |
by Allasso (Monk) on Apr 29, 2019 at 15:14 UTC | |
by Anonymous Monk on Apr 29, 2019 at 16:24 UTC | |
|