Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot

How to handle unicode txt file on Windows

by cheerful (Initiate)
on Nov 03, 2008 at 17:18 UTC ( #721156=perlquestion: print w/replies, xml ) Need Help??

cheerful has asked for the wisdom of the Perl Monks concerning the following question:

They starts with the byte order market FFFE or FEFF. I tried something like this:

my $fh = new FileHandle("< $file"); if (! $fh) { die "failed to open list file '$file': $!"; } my $marker; if (2 != read($fh, $marker, 2)) { die "Failed to read the first 2 bytes from $file"; } if ($marker eq $UNICODE_FFFE) { binmode($fh, ":encoding(utf8)"); } else { $fh->seek(0, 0); }

But the following read

$line = <$fh>;

still generates a lot of error

print $line will produces letter alternating with space.

The script deals with just ascii text.

1. What's the proper to detect unicode in file?

2. How do I deal with unicode string in regular expression matching?

3. Do I need to convert unicode to non-unicode string to do string operation incl. matching? If so, what's the way to do that?

Replies are listed 'Best First'.
Re: How to handle unicode txt file on Windows
by almut (Canon) on Nov 03, 2008 at 17:13 UTC

    FFFE is the marker for UTF-16LE, not UTF-8... (so, ":encoding(UTF-16LE)" might work better)

      And :encoding(UTF-16) will work even better since it absorbs the BOM.
        What would happen if it's called on a non-unicode file?
Re: How to handle unicode txt file on Windows
by ig (Vicar) on Nov 03, 2008 at 17:47 UTC
Re: How to handle unicode txt file on Windows
by jethro (Monsignor) on Nov 03, 2008 at 17:26 UTC
    To answer 2. and 3.: You don't. Just use the strings, no matter where they came from or what format they are. The only times you have to do something special is when reading or writing files (and in the case you write the script itself in utf format). As soon as a string is "inside" perl, you can forget about its encoding.
Re: How to handle unicode txt file on Windows
by jplindstrom (Monsignor) on Nov 04, 2008 at 17:49 UTC

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://721156]
Approved by jettero
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2022-05-19 09:25 GMT
Find Nodes?
    Voting Booth?
    Do you prefer to work remotely?

    Results (71 votes). Check out past polls.