in reply to Problems Handling UCS-2LE
The bytes you are skipping form character U+FEFF, the Byte-order mark. Use UCS-2 instead of UCS-2le and it will skip the character for you.
The wide character warning is issued because you outputting decoded characters without encoding them (loosely speaking). Fix:
# Encode output. # Use the encoding that's appropriate for you. binmode STDOUT, ':encoding(UTF-8)'; my $lines; { # Decode input. open my $log_fh, "<:encoding(UCS-2)", $file or die($!); local $/ = undef; $lines = <LOGFILE>; } print "...\n", $lines, "...\n";
On unix, you can do use open ':std', ':locale'; to set the "correct" encoding for STDOUT, but it doesn't work on Windows :(
If I did not skip the first two byes, [...] "$lines = <LOGFILE>" would only capture a few characters out of a 1088 character file.
You are mistaken.
If Perl should have handled the UCS-2LE file without needing to include the encoding or the skipping of bytes
Perl has no way of knowing the encoding of a file, or even if it's a text file for that matter.
If the IDrive log files might be a non-standard or corrupted UCS-2LE
Why do you ask that?
There are some byte combination that aren't allowed in UCS-2*. Encountering them is fatal.
$ perl -e'open $fh, "<:encoding(UCS-2le)", \"\x00\xD8"; <$fh>' UCS-2LE:no surrogates allowed d800 at -e line 1.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Problems Handling UCS-2LE
by Ecurb (Initiate) on May 23, 2009 at 19:18 UTC | |
by ikegami (Patriarch) on May 25, 2009 at 17:24 UTC | |
by Ecurb (Initiate) on May 24, 2009 at 02:18 UTC | |
by ikegami (Patriarch) on May 25, 2009 at 17:38 UTC | |
by Ecurb (Initiate) on Jun 05, 2009 at 17:01 UTC |