Hi all,

I don't know what to think about the following observation: I was tracking a bug which turns out to be caused by some invalid utf8 characters in my data. More precisely it is due to the fact that the files were not opened in the same way (see below) in two distinct modules. So I solved my bug but I wonder whether this difference is intended or not, and where to report it if necessary?

Here is the (simplified) code:

my $file="simple.txt"; open(FILE, "<:utf8", $file) or die "can not open $file"; my @data1 = <FILE>; close(FILE); use open ':encoding(utf8)'; open(FILE, $file) or die "can not open $file"; my @data2= <FILE>; close(FILE); die "different size" if (scalar(@data1) != scalar(@data2)); while (@data1) { my $s1 = shift(@data1); my $s2 = shift(@data2); # print "1: $s1\n2: $s2\n"; die "different data" if ($s1 ne $s2); }

and here is the output with my invalid UTF8 data:

utf8 "\xD0" does not map to Unicode at ./essai.pl line 8, <FILE> line +1. different data at ./essai.pl line 21.

(I disabled the print because I can not write the russian chars here - anyway the faulty character is not visible)

Thanks!


In reply to different utf8 method = different behaviour? by erwan

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.