Since you have "binmode" where it belongs on both file handles, and it works on macosx, I have to assume the problem on the WinXP system is either a brain-damaged configuration around the perl installation on that box, or something else affecting the output file after perl has finished writing it, or else the input file happens to already have the same set of null bytes as you find in the output file.

Have you checked the input file? What exactly happens to the output file after perl writes it? What is the first thing used to open and inspect the file's content? As mentioned in a previous reply, when there is a null byte next to each character byte, it's a sure sign of UTF-16 encoding; if the initial byte is null (or the first two bytes are "\xFE \xFF") it's UTF-16BE; if the second byte is null (or the first two bytes are "\xFF \xFE") it's UTF-16LE.

You can do a simple test on the victim's WinXP box to see if perl is brain-damaged -- e.g.:

#!/usr/bin/perl $out = "test\xb0 test"; open(O, ">test.txt") or die "$!"; binmode O; print O $out; close O; $s = -s "test.txt"; print "wrote ".length($out)." bytes to test.txt; file size is $s bytes +\n";
The report should show the same number of bytes for the string length and the file size (and that should be 10); next you check the file by other means, and see whether, at some point, its contents change when you open it with some particular windows tool.

As for your script, I don't understand why you want to have three copies of the file data in memory (single slurped scalar, array of lines, hash of lines). Why not do it like this?

#!/usr/bin/perl use strict; use warnings; use Getopt::Std; our ( $opt_i, $opt_o ); my ( $ifh, $ofh ); getopts( 'i:o:' ) and $opt_i and $opt_o or die "Usage: $0 -i infile -o outfile\n"; warn "reading \"$opt_i\" and writing to \"$opt_o\"\n"; open( $ifh, "<", $opt_i ) or die "$opt_i: $!\n"; open( $ofh, ">", $opt_o ) or die "$opt_o: $!\n"; binmode $ifh; binmode $ofh; my %lines; $lines{$_}++ while (<$ifh>); my $line_count = 0; for (sort keys %lines) { print $ofh $_; $line_count += $lines{$_}; } close $ofh or die "error on closing output file: $!\n"; warn "read $line_count lines from $opt_i, wrote ".scalar(keys %lines). " lines to $opt_o\n";
(Note the extra conditions after getopts: it will return true if no option flags are given at all -- that's why they are called "option" flags...)

Update: after posting, I noticed that the OP code reported input and output line counts, so I added stuff to my version of the script accordingly.


In reply to Re: script inserts \x00 bytes on WinXP by graff
in thread script inserts \x00 bytes on WinXP by dwhite20899

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.