in reply to Re: Re: Unicode source code problem in 5.6.1
in thread Unicode source code problem in 5.6.1

This is also kind-of a reply to tye's (valid) point. The file doesn't have any HTML entities. Here's the top lines of what od says about the file (I have cygwin on the Win 2K machine):

0000000 u s e s t r i c t ; \r \n u s +e 0000020 w a r n i n g s ; \r \n u s e 0000040 u t f 8 ; \r \n \r \n m y $ 316 261 += 0000060 5 ; \r \n m y $ 316 246 = 4 ; \ +r
Note the variable names look like two octal bytes. So I suspect tye's right: I still don't have exactly what John entered, but what I did have worked as expected.

Also, I tried it with and without strict. With strict I get the expected:

> perl -w ca21hp4a.pl
Global symbol "$╬▒" requires explicit package name at ca21hp4a.pl line 5.
Execution of ca21hp4a.pl aborted due to compilation errors.
I had to use <pre> tags instead of <code> tags in the above snippet to make those characters show up, although they still got turned into HTML entities.

Waah, this encoding stuff is too confusing.

Replies are listed 'Best First'.
Re: Re (3): Unicode source code problem in 5.6.1
by John M. Dlugosz (Monsignor) on Nov 18, 2002 at 21:04 UTC
    That's right: the UTF-8 encoding of the greek symbols are two bytes long. The first one will be (in binary) 110xxxxx and the second one 10xxxxxx, where the x's are the actual code point value up to 11 bits. In octal, that means a leading 3 and a leading 2, respectivly.

    By "as expected", you mean the same results I got, not the previosly expected results as defined in the docs, right?

    The funny chars in the error message are due to the Console window using a different code page. It's using a DOS-compatible OEM code page, probably 437. You can change that via "MODE CON CP SELECT=1251" to match the GUI, but that won't help here since there is no UTF-8 setting. Redirect it to a file and view with a UTF-8 editor.