in reply to script inserts \x00 bytes on WinXP
Have you checked the input file? What exactly happens to the output file after perl writes it? What is the first thing used to open and inspect the file's content? As mentioned in a previous reply, when there is a null byte next to each character byte, it's a sure sign of UTF-16 encoding; if the initial byte is null (or the first two bytes are "\xFE \xFF") it's UTF-16BE; if the second byte is null (or the first two bytes are "\xFF \xFE") it's UTF-16LE.
You can do a simple test on the victim's WinXP box to see if perl is brain-damaged -- e.g.:
The report should show the same number of bytes for the string length and the file size (and that should be 10); next you check the file by other means, and see whether, at some point, its contents change when you open it with some particular windows tool.#!/usr/bin/perl $out = "test\xb0 test"; open(O, ">test.txt") or die "$!"; binmode O; print O $out; close O; $s = -s "test.txt"; print "wrote ".length($out)." bytes to test.txt; file size is $s bytes +\n";
As for your script, I don't understand why you want to have three copies of the file data in memory (single slurped scalar, array of lines, hash of lines). Why not do it like this?
(Note the extra conditions after getopts: it will return true if no option flags are given at all -- that's why they are called "option" flags...)#!/usr/bin/perl use strict; use warnings; use Getopt::Std; our ( $opt_i, $opt_o ); my ( $ifh, $ofh ); getopts( 'i:o:' ) and $opt_i and $opt_o or die "Usage: $0 -i infile -o outfile\n"; warn "reading \"$opt_i\" and writing to \"$opt_o\"\n"; open( $ifh, "<", $opt_i ) or die "$opt_i: $!\n"; open( $ofh, ">", $opt_o ) or die "$opt_o: $!\n"; binmode $ifh; binmode $ofh; my %lines; $lines{$_}++ while (<$ifh>); my $line_count = 0; for (sort keys %lines) { print $ofh $_; $line_count += $lines{$_}; } close $ofh or die "error on closing output file: $!\n"; warn "read $line_count lines from $opt_i, wrote ".scalar(keys %lines). " lines to $opt_o\n";
Update: after posting, I noticed that the OP code reported input and output line counts, so I added stuff to my version of the script accordingly.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: script inserts \x00 bytes on WinXP
by dwhite20899 (Friar) on Sep 06, 2008 at 01:59 UTC | |
by Anonymous Monk on Sep 06, 2008 at 07:47 UTC | |
by dwhite20899 (Friar) on Sep 06, 2008 at 13:30 UTC |