Greetings,

I am having trouble with getting text (CRLF specifically) to encode correctly into UTF-16 little endian. Essentially I am expecting this output below:

~~~ Human readable output of what is being generated ~~~~~~~~~~~~ Line1 Line2 Line4 ~~~~~ Actual Results ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 4C 00 69 00 6E 00 65 00 31 00 0D 0A 00 4C 00 69 00 6E 00 65 00 32 00 0D 0A 00 0D 0A 00 4C 00 69 00 6E 00 65 00 34 00 0D 0A 00 ~~What was expected and is required for valid UTF-16LE encoding ~~~ 4C 00 69 00 6E 00 65 00 31 00 0D 00 0A 00 ^ byte missing from actual results 4C 00 69 00 6E 00 65 00 32 00 0D 00 0A 00 0D 00 0A 00 ^ byte missing from actual results ^ byte missing from actual results 4C 00 69 00 6E 00 65 00 34 00 0D 00 0A 00 ^ byte missing from actual results ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I suspect this issue (or bug in Encode.pm?) may be due to \n being mappped to CRLF on windows whereas in *nix its just LF and Encode.pm and it's dependancies aren't handling that correctly.

I have tried numerous things, eg. using BE, UCS-2LE/BE, using \015\012 instead of \n - all seem to have the same issue.

  1. Is this a bug, or am I doing something wrong?
  2. Assuming I'm not doing something wrong, is there any way to code around this issue in Perl 5.8.8 (Encode.pm v2.23)?
I'm retesting this on Perl 5.10.1 currently and will update with results. Any assistance or advice would be much appreciated.

Update:
Issue also reproducable on Perl 5.10.1. Am I correct in thinking this is a bug with the Encode::Unicode?
Can anyone think of any alternatives to what Anonymous Monk suggested? I appreciate any and all feedback. Thanks

Update2:

Issue resolved. Key points from this experience:

(Code to reproduce is in the Readmore)
use strict; use warnings; use Encode qw(encode decode); ### Actual Results my $string = "Line1\nLine2\n\nLine4\n"; open (my $output_fh, ">:encoding(utf-16le)", 'Test_reg.reg') || die "Unable to create reg output file. $!"; print {$output_fh} $string ; ### something else I tried, also doesn't work correctly. my $string2 = "Line1\015\012Line2\015\012\015\012Line4\015\012"; open (my $output_fh2, ">:encoding(utf-16le)", 'Test_reg2.reg') || die "Unable to create reg output file. $!"; print {$output_fh2} $string2 ;

In reply to CRLF not encoding into UTF-16LE correctly on ActivePerl 5.8.8 by desemondo

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.