in reply to Re^2: Unexpected interaction between decode_entities() and lc()
in thread Unexpected interaction between decode_entities() and lc()

The encoding layer is already handling that, so forget about the utf8::decode($_); line and it all just works:

$ cat uct.pl 
#!/usr/bin/perl5.16.3                                                                                            

use strict;
use HTML::Entities;

binmode STDIN, ':encoding(UTF-8)';
binmode STDOUT, ':encoding(UTF-8)';

while(<>) {
    chomp;

    $_ = decode_entities($_);
    $_ = lc($_);

    print $_, "\n";
}
$ echo -e "Édition limitée.\n&Eacute;dition limitée." | perl uct.pl 
édition limitée.
édition limitée.

Update: forgot to mention: this is on perl 5.20.3 regardless of your #! line.

Replies are listed 'Best First'.
Re^4: Unexpected interaction between decode_entities() and lc()
by haukex (Archbishop) on Nov 14, 2017 at 15:53 UTC

    Unfortunately that only works if piping stuff into Perl, but it does not work if files are specified on the command line, since those are opened and are not affected by binmode STDIN (see my post here).

      Update: Nonsense. Sorry.

      You can use

      binmode ARGV, ':encoding(UTF-8)';

      to affect the encoding of the input coming through the diamond operator.

      ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,