Maybe now is a good time to show the exact code you are running and the exact input (again, as hexdump) you are giving it, and also to describe what method you are using to inspect the output.

For me, on Perl 5.20, on Windows 7, with the Latin-1 codepage, I get the following output from the program I posted with the input file, which shows some more "characters" on output, but that is expected because my terminal is not set to UTF-8:

Ruler : [12345678901234567890] Input : [YearÔÖÑJEDocSrcP_USE_DATE P_DATE CurLine] 20 : [YearÔÖÑJEDocSrcP_USE_D]

On Perl 5.20, on Windows 7, with the UTF-8 codepage (via chcp 65001), I get the following output from the program I posted with the input file, which has 20 characters (not bytes) on output, as I expect:

Ruler : [12345678901234567890]
Input : [Year♥JEDocSrcP_USE_DATE P_DATE CurLine]
20    : [Year♥JEDocSrcP_USE_D]
---

The script I'm running is:

#!/usr/bin/perl -w use strict; use Encode qw/encode decode/; open (INFILE, "<:encoding(UTF-8)", "utf8.txt") || die "blah blah blah" +; open (OUTFILE, ">:encoding(UTF-8)", "oututf8.txt") || die "blah blah"; binmode STDOUT, ':encoding(UTF-8)'; print "Ruler : [12345678901234567890]\n"; while (my $line = <INFILE>) { chomp ($line); $line =~ s!^\N{BYTE ORDER MARK}!!; print "Input : [$line]\n"; my $linestart = substr($line,0,20); my $outline = $linestart; print "20 : [$outline]\n"; print "---\n"; print OUTFILE "$outline\n"; } close (INFILE);

In reply to Re^5: Read and write UTF-8 by Corion
in thread Read and write UTF-8 by Norah

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.