Re: Problems Handling UCS-2LE

The bytes you are skipping form character U+FEFF, the Byte-order mark. Use UCS-2 instead of UCS-2le and it will skip the character for you.

The wide character warning is issued because you outputting decoded characters without encoding them (loosely speaking). Fix:

# Encode output.
# Use the encoding that's appropriate for you.
binmode STDOUT, ':encoding(UTF-8)';

my $lines;
{
   # Decode input.
   open my $log_fh, "<:encoding(UCS-2)", $file
      or die($!);
   local $/ = undef;
   $lines = <LOGFILE>;
}

print "...\n", $lines, "...\n";
[download]

On unix, you can do use open ':std', ':locale'; to set the "correct" encoding for STDOUT, but it doesn't work on Windows :(

If I did not skip the first two byes, [...] "$lines = <LOGFILE>" would only capture a few characters out of a 1088 character file.

You are mistaken.

If Perl should have handled the UCS-2LE file without needing to include the encoding or the skipping of bytes

Perl has no way of knowing the encoding of a file, or even if it's a text file for that matter.

If the IDrive log files might be a non-standard or corrupted UCS-2LE

Why do you ask that?

There are some byte combination that aren't allowed in UCS-2*. Encountering them is fatal.

$ perl -e'open $fh, "<:encoding(UCS-2le)", \"\x00\xD8"; <$fh>'
UCS-2LE:no surrogates allowed d800 at -e line 1.
[download]

Comment on Re: Problems Handling UCS-2LE Select or Download Code

Replies are listed 'Best First'.
Re^2: Problems Handling UCS-2LE by Ecurb (Initiate) on May 23, 2009 at 19:18 UTC
I tried: `my $logopen = open LOGFILE,"<:encoding(UCS-2)", $file; local $/ = undef; my $lines = <LOGFILE>;` [download] The last line generates the error message:"UCS-2BE:Unicode character fffe is illegal" Thanks for any other thoughts on what I might be doing wrong. Bruce	[reply] [d/l]
Re^3: Problems Handling UCS-2LE by ikegami (Patriarch) on May 25, 2009 at 17:24 UTC
Indeed! That shouldn't happen. Or at the very least, it's inconsistent with UTF-16. `$ perl -le'open $fh, "<:encoding(UTF-16le)", \"\xFF\xFE"; print length + <$fh>' 1 $ perl -le'open $fh, "<:encoding(UTF-16)", \"\xFF\xFE"; print length < +$fh>' 0 $ perl -le'open $fh, "<:encoding(UCS-2le)", \"\xFF\xFE"; print length +<$fh>' 1 $ perl -le'open $fh, "<:encoding(UCS-2)", \"\xFF\xFE"; print length <$ +fh>' UCS-2BE:Unicode character fffe is illegal at -e line 1.` [download] Using File::BOM or the following would be a better solutions than skipping the first two bytes. `$lines =~ s/\x{FEFF}//g;` [download]	[reply] [d/l] [select]
Re^3: Problems Handling UCS-2LE by Ecurb (Initiate) on May 24, 2009 at 02:18 UTC
BTW, here is the log file I am trying to read (1291 characters). Only IDrive.com log files had this issue, all the other log files work fine. IDrive UCS2 Log File Thanks Bruce	[reply]
Re^4: Problems Handling UCS-2LE by ikegami (Patriarch) on May 25, 2009 at 17:38 UTC
`#!/usr/bin/perl use strict; use warnings; my ($log_qfn) = $ARGV[0] or die("usage"); # Encode output. # Use the encoding that's appropriate for you. binmode STDOUT, ':encoding(UTF-8)'; my $lines; { # Decode input. open my $log_fh, "<:encoding(UCS-2le)", $log_qfn or die($!); local $/ = undef; $lines = <$log_fh>; $lines =~ s/\x{FEFF}//g; } print(length($lines), "\n"); print($lines);` [download] 1291 -------------------- BackupSet Number :21 -------------------- User name bruce_benson Backup Operation : Scheduled Version No: 3.2.6 Total backup set size 42.46 MB [ 05-22-2009 04:01:08 ] [Incremental Backup]Backed up file T:\Quicke +nw\BENSONS.QPH, 3.26 MB [ 05-22-2009 04:01:25 ] [Incremental Backup]Backed up file T:\Quicke +nw\BENSONS.IDX, 4.17 MB [ 05-22-2009 04:01:33 ] [Incremental Backup]Backed up file T:\Quicke +nw\BENSONS.QEL, 4.43 MB [ 05-22-2009 04:01:36 ] [Incremental Backup]Backed up file T:\Passwo +rdSafe\passwordsafe.dat, 26.46 KB [ 05-22-2009 04:01:40 ] [Incremental Backup]Backed up file T:\Quicke +nw\BENSONSOFXLOG.DAT, 203.61 KB [ 05-22-2009 04:02:59 ] [Incremental Backup]Backed up file T:\Quicke +nw\BENSONS.QDF, 30.37 MB BACKUP START TIME : 05-22-2009 04:01:04 BACKUP END TIME : 05-22-2009 04:03:00 Total Time : 00:01:56 Number of files considered for Backup: 13 file(s) Number of files found to be in sync : 7 file(s) Backup Completed [Backed up Files : 6 of 6] Please note, from your selected backupset, the below listed file(s)/fo +lder(s) have been excluded from backup in the 'Backup' pane under Too +ls-->preferences: C:\Documents and Settings\Bruce\My Documents\ C:\Documents and Settings\Bruce\Desktop\ [download]	[reply] [d/l] [select]
Re^5: Problems Handling UCS-2LE by Ecurb (Initiate) on Jun 05, 2009 at 17:01 UTC