in reply to Problems Handling UCS-2LE

The bytes you are skipping form character U+FEFF, the Byte-order mark. Use UCS-2 instead of UCS-2le and it will skip the character for you.

The wide character warning is issued because you outputting decoded characters without encoding them (loosely speaking). Fix:

# Encode output. # Use the encoding that's appropriate for you. binmode STDOUT, ':encoding(UTF-8)'; my $lines; { # Decode input. open my $log_fh, "<:encoding(UCS-2)", $file or die($!); local $/ = undef; $lines = <LOGFILE>; } print "...\n", $lines, "...\n";

On unix, you can do use open ':std', ':locale'; to set the "correct" encoding for STDOUT, but it doesn't work on Windows :(

If I did not skip the first two byes, [...] "$lines = <LOGFILE>" would only capture a few characters out of a 1088 character file.

You are mistaken.

If Perl should have handled the UCS-2LE file without needing to include the encoding or the skipping of bytes

Perl has no way of knowing the encoding of a file, or even if it's a text file for that matter.

If the IDrive log files might be a non-standard or corrupted UCS-2LE

Why do you ask that?

There are some byte combination that aren't allowed in UCS-2*. Encountering them is fatal.

$ perl -e'open $fh, "<:encoding(UCS-2le)", \"\x00\xD8"; <$fh>' UCS-2LE:no surrogates allowed d800 at -e line 1.

Replies are listed 'Best First'.
Re^2: Problems Handling UCS-2LE
by Ecurb (Initiate) on May 23, 2009 at 19:18 UTC
    I tried:
    my $logopen = open LOGFILE,"<:encoding(UCS-2)", $file; local $/ = undef; my $lines = <LOGFILE>;
    The last line generates the error message:"UCS-2BE:Unicode character fffe is illegal"
    Thanks for any other thoughts on what I might be doing wrong.
    Bruce

      Indeed! That shouldn't happen. Or at the very least, it's inconsistent with UTF-16.

      $ perl -le'open $fh, "<:encoding(UTF-16le)", \"\xFF\xFE"; print length + <$fh>' 1 $ perl -le'open $fh, "<:encoding(UTF-16)", \"\xFF\xFE"; print length < +$fh>' 0 $ perl -le'open $fh, "<:encoding(UCS-2le)", \"\xFF\xFE"; print length +<$fh>' 1 $ perl -le'open $fh, "<:encoding(UCS-2)", \"\xFF\xFE"; print length <$ +fh>' UCS-2BE:Unicode character fffe is illegal at -e line 1.

      Using File::BOM or the following would be a better solutions than skipping the first two bytes.

      $lines =~ s/\x{FEFF}//g;
      BTW, here is the log file I am trying to read (1291 characters). Only IDrive.com log files had this issue, all the other log files work fine.

      IDrive UCS2 Log File

      Thanks
      Bruce
        #!/usr/bin/perl use strict; use warnings; my ($log_qfn) = $ARGV[0] or die("usage"); # Encode output. # Use the encoding that's appropriate for you. binmode STDOUT, ':encoding(UTF-8)'; my $lines; { # Decode input. open my $log_fh, "<:encoding(UCS-2le)", $log_qfn or die($!); local $/ = undef; $lines = <$log_fh>; $lines =~ s/\x{FEFF}//g; } print(length($lines), "\n"); print($lines);
        1291 -------------------- BackupSet Number :21 -------------------- User name bruce_benson Backup Operation : Scheduled Version No: 3.2.6 Total backup set size 42.46 MB [ 05-22-2009 04:01:08 ] [Incremental Backup]Backed up file T:\Quicke +nw\BENSONS.QPH, 3.26 MB [ 05-22-2009 04:01:25 ] [Incremental Backup]Backed up file T:\Quicke +nw\BENSONS.IDX, 4.17 MB [ 05-22-2009 04:01:33 ] [Incremental Backup]Backed up file T:\Quicke +nw\BENSONS.QEL, 4.43 MB [ 05-22-2009 04:01:36 ] [Incremental Backup]Backed up file T:\Passwo +rdSafe\passwordsafe.dat, 26.46 KB [ 05-22-2009 04:01:40 ] [Incremental Backup]Backed up file T:\Quicke +nw\BENSONSOFXLOG.DAT, 203.61 KB [ 05-22-2009 04:02:59 ] [Incremental Backup]Backed up file T:\Quicke +nw\BENSONS.QDF, 30.37 MB BACKUP START TIME : 05-22-2009 04:01:04 BACKUP END TIME : 05-22-2009 04:03:00 Total Time : 00:01:56 Number of files considered for Backup: 13 file(s) Number of files found to be in sync : 7 file(s) Backup Completed [Backed up Files : 6 of 6] Please note, from your selected backupset, the below listed file(s)/fo +lder(s) have been excluded from backup in the 'Backup' pane under Too +ls-->preferences: C:\Documents and Settings\Bruce\My Documents\ C:\Documents and Settings\Bruce\Desktop\