Re: New line in Unicode again

Replies are listed 'Best First'.
Re: Re: New line in Unicode again by Thelonius (Priest) on Apr 21, 2003 at 14:37 UTC
The default encoding for Unicode on Windows is UTF-16LE, i.e. little endian, so it would actually be "\xD\0\xA\0". Strictly speaking, Unicode does assign control characters, but that still doesn't guarantee that the logical end-of-line will be the same between systems. You should start your file with a byte order marker (BOM), which is the same as a zero-width no-break spaces. It is U+FEFF, which in UTF-16LE is "\xFF\xFE". By the way, what you could have done is create a Unicode file in Notepad, then use Perl to look at the file and see what it has in it. Also note that Perl 5.8 has support for Unicode. See perlunicode and Encode::Unicode.	[reply]
Re: Re: Re: New line in Unicode again by donno20 (Sexton) on Apr 22, 2003 at 02:26 UTC
>it would actually be "\xD\0\xA\0". Correct! I was wondering why it was "a\0b\0c\0" paired instead of "\0a\0b\0c". But I don't have "\x". >You should start your file with a byte order marker (BOM), which is the same as a zero-width no-break spaces. It is U+FEFF, which in UTF-16LE is "\xFF\xFE". I don't know what actually the code is. But I did a little trick by reading the first two bytes of an existing unicode file into a buffer. It was work !. >By the way, what you could have done is create a Unicode file in Notepad, then use Perl to look at the file and see what it has in it The first two bytes cannot be read by print(), and hidden in notepad. How do I know it was "\xFF\xFE" ? Cheers, ^_^	[reply]