The node A UTF8 round trip with MySQL and specifically Re: A UTF8 round trip with MySQL seems to have arrived at a coincidental time for me. With all this conflicker talk someone here ran an nmap scan and it crashed my perl server:

utf8 "\x80" does not map to Unicode at Queue.pm line 835, <GEN6234> li +ne 1. Malformed UTF-8 character (unexpected continuation byte 0x80, with no +preceding start byte) in pattern match (m//) at Queue.pm line 836, <G +EN6234> line 1. utf8 "\xD7" does not map to Unicode at Queue.pm line 835, <GEN6238> li +ne 1. utf8 "\xA4" does not map to Unicode at Queue.pm line 835, <GEN6239> li +ne 1. Malformed UTF-8 character (overflow at 0xcd0b2000, byte 0x00, after st +art byte 0xff) in subroutine entry at /usr/lib/perl5/5.8.8/i386-linux-thread-multi/Data/Dumper.pm line 179, +<GEN6239> line 1. Malformed UTF-8 character (overflow at 0xcd0b2000, byte 0x00, after st +art byte 0xff) in subroutine entry at /usr/lib/perl5/5.8.8/i386-linux-thread-multi/Data/Dumper.pm line 179, +<GEN6239> line 1. Malformed UTF-8 character (overflow at 0xcd0b2000, byte 0x00, after st +art byte 0xff) in subroutine entry at /usr/lib/perl5/5.8.8/i386-linux-thread-multi/Data/Dumper.pm line 179, +<GEN6239> line 1. Malformed UTF-8 character (overflow at 0xcd0b2000, byte 0x00, after st +art byte 0xff) in subroutine entry at /usr/lib/perl5/5.8.8/i386-linux-thread-multi/Data/Dumper.pm line 179, +<GEN6239> line 1. Malformed UTF-8 character (overflow at 0xcd0b2000, byte 0x00, after st +art byte 0xff) in subroutine entry at /usr/lib/perl5/5.8.8/i386-linux-thread-multi/Data/Dumper.pm line 179, +<GEN6239> line 1. Malformed UTF-8 character (overflow at 0xcd0b2000, byte 0x00, after st +art byte 0xff) in subroutine entry at /usr/lib/perl5/5.8.8/i386-linux-thread-multi/Data/Dumper.pm line 179, +<GEN6239> line 1. utf8 "\x80" does not map to Unicode at Queue.pm line 835, <GEN6242> li +ne 1. utf8 "\xE0" does not map to Unicode at Queue.pm line 835, <GEN6245> line 1. Segmentation fault

The code in question accepts UTF8 encoded data (well it is supposed to be encoded) from a socket and has set :utf8 I/O layer on the socket. The code generating the warnings is reading from the said socket.

There are a few things about this and the nodes quoted I don't understand.

  1. what exactly is the difference between :utf8 and :encoding(UTF8) as Juerd seems to be suggesting :utf8 should not be used and b) simply sets the internal utf8 flag and yet I am getting warnings out suggesting slightly more than a flag set is occurring.
  2. why is this segfaulting.
  3. why when I change to use :encoding(UTF8) it stops segfaulting but slows down a lot.

As I quick test I got hold of a jpg file (obviously not utf8 encoded) and did:

use strict; use warnings; my $fh; open ($fh, "<:utf8", "schema.jpg"); my $img = ''; while (<$fh>) { $img .= $_; }

which takes 0.123s to run and outputs a lot of warnings. Changing to use :encoding(UTF8) takes 27s and outputs hundreds of warnings.


In reply to :utf8 I/O layer vs encoding(UTF8), segfault and speed by mje

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.