cheerful has asked for the wisdom of the Perl Monks concerning the following question:

They starts with the byte order market FFFE or FEFF. I tried something like this:

my $fh = new FileHandle("< $file"); if (! $fh) { die "failed to open list file '$file': $!"; } my $marker; if (2 != read($fh, $marker, 2)) { die "Failed to read the first 2 bytes from $file"; } if ($marker eq $UNICODE_FFFE) { binmode($fh, ":encoding(utf8)"); } else { $fh->seek(0, 0); }

But the following read

$line = <$fh>;

still generates a lot of error

print $line will produces letter alternating with space.

The script deals with just ascii text.

1. What's the proper to detect unicode in file?

2. How do I deal with unicode string in regular expression matching?

3. Do I need to convert unicode to non-unicode string to do string operation incl. matching? If so, what's the way to do that?

Replies are listed 'Best First'.
Re: How to handle unicode txt file on Windows
by almut (Canon) on Nov 03, 2008 at 17:13 UTC

    FFFE is the marker for UTF-16LE, not UTF-8... (so, ":encoding(UTF-16LE)" might work better)

      And :encoding(UTF-16) will work even better since it absorbs the BOM.
        What would happen if it's called on a non-unicode file?
Re: How to handle unicode txt file on Windows
by ig (Vicar) on Nov 03, 2008 at 17:47 UTC
Re: How to handle unicode txt file on Windows
by jethro (Monsignor) on Nov 03, 2008 at 17:26 UTC
    To answer 2. and 3.: You don't. Just use the strings, no matter where they came from or what format they are. The only times you have to do something special is when reading or writing files (and in the case you write the script itself in utf format). As soon as a string is "inside" perl, you can forget about its encoding.
Re: How to handle unicode txt file on Windows
by jplindstrom (Monsignor) on Nov 04, 2008 at 17:49 UTC