in reply to Re: Encoding horridness
in thread Encoding horridness

Good advice to be sure. But since latin-1 is a subset of unicode, isn't decode('Latin-1', $_) pretty much a no-op?

Replies are listed 'Best First'.
Re^3: Encoding horridness
by Corion (Patriarch) on Jul 12, 2017 at 14:20 UTC

    No, because high-bit characters/octets in Latin-1 encode differently as octets in UTF-8, and Perl doesn't know what to do with high-bit characters when writing them.

      What I'm wondering, though, is if there's ever a situation where
      encode('utf8', decode('Latin-1', $_))
      produces different output from
      encode('utf8', $_)
        Yes, for example:
        $_ = decode('utf-8', "\N{LATIN SMALL LETTER A WITH ACUTE}"); say encode('utf8', $_); # Replacement character EF +BFBD. say encode('utf8', decode('Latin-1', $_)); # Dies.
        ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re^3: Encoding horridness
by hippo (Archbishop) on Jul 12, 2017 at 14:16 UTC

    The OP wants to move from Latin-1 to UTF-8. Latin-1 is not a subset of UTF-8.

      Yes, and encode('utf8', decode('Latin-1', $_)) isn't a no-op.