in reply to :utf8 I/O layer vs encoding(UTF8), segfault and speed

1. what exactly is the difference between :utf8 and :encoding(UTF8)

As Juerd says:

What exactly isn't clear about that?

Properly decoding/validating of course takes time.  OTOH, if you let perl work with unvalidated 'UTF-8' strings, nasty things can happen (including segfaults), because Perl's unicode internals have not been implemented to handle this safely in each and every case... Strings which are not properly encoded in UTF-8 should not have the utf8 flag on.

  • Comment on Re: :utf8 I/O layer vs encoding(UTF8), segfault and speed

Replies are listed 'Best First'.
Re^2: :utf8 I/O layer vs encoding(UTF8), segfault and speed
by mje (Curate) on Apr 01, 2009 at 18:55 UTC

    ok, I get that but where do the warnings/errors come from?

    What I mean is are the errors separate from encoding problems

      What I mean is are the errors separate from encoding problems

      I only see two types of errors: "utf8 "\x.." does not map to Unicode", and "Malformed UTF-8 character (...details...)", both of which indicate encoding problems due to malformed input.

        I'm obviously missing something. I know the input is not correctly utf8 encoded. If :utf8 I/O layer only sets the internal utf8 flag and does not check the encoding then why am I getting encoding errors?