Re: Unicode source code problem in 5.6.1
by VSarkiss (Monsignor) on Nov 18, 2002 at 17:36 UTC
|
Hm, something else must be up. I downloaded the code from the node you mentioned, and it ran OK. (It printed "4"). Here's my Perl version, which is running on Windows 2000:
This is perl, v5.6.1 built for MSWin32-x86-multi-thread
(with 1 registered patch, see perl -V for more detail)
Copyright 1987-2001, Larry Wall
Binary build 633 provided by ActiveState Corp. http://www.ActiveState.
+com
Built 21:33:05 Jun 17 2002
To make sure I didn't clobber any characters, I used the "D/L code" link rather than copy and paste from the browser window. I did have to remove a stray my at the top of the file, but I don't think that's related.
Lemme know if you need more details on this installation. | [reply] [d/l] [select] |
|
|
| [reply] |
|
|
| [reply] |
|
|
Hmm. I guess that might work much of the time. Of course, the code is displayed incorrectly.
When you download the code, you should get the correct byte stream but tagged as Latin-1. If the code is saved in a UTF-8-aware file system (since you are trying to write code in UTF-8), the bytes would be converted from Latin-1 to UTF-8 which would give you different bytes. Even if you save the code using only one-byte characters, translation could happen because the browser knows the operating system expects results in something besides Latin-1, like an OEM encoding (such as "code page 437" in Windows).
I'd think that most current "save as" operations would just save bytes and ignore encodings so you'd get the desired byte values. But I wouldn't bet on that.
- tye
| [reply] |
|
|
|
|
0000000 u s e s t r i c t ; \r \n u s
+e
0000020 w a r n i n g s ; \r \n u s e
0000040 u t f 8 ; \r \n \r \n m y $ 316 261
+=
0000060 5 ; \r \n m y $ 316 246 = 4 ; \
+r
Note the variable names look like two octal bytes. So I suspect tye's right: I still don't have exactly what John entered, but what I did have worked as expected.
Also, I tried it with and without strict. With strict I get the expected:
> perl -w ca21hp4a.pl
Global symbol "$╬▒" requires explicit package name at ca21hp4a.pl line 5.
Execution of ca21hp4a.pl aborted due to compilation errors.
I had to use <pre> tags instead of <code> tags in the above snippet to make those characters show up, although they still got turned into HTML entities.
Waah, this encoding stuff is too confusing. | [reply] [d/l] [select] |
|
|
Re: Unicode source code problem in 5.6.1
by Thelonius (Priest) on Nov 18, 2002 at 17:47 UTC
|
Here's a twist. Under perl 5.8.0, compiled for cygwin, your program works fine if and only if use strict; is present.
On the other hand, if use strict is commented out, I get this bizarre error: "my" variable $strict::VERSION can't be in a package at lib/strict.pm
+line 93, near "$strict::VERSION "
Compilation failed in require at lib/utf8_heavy.pl line 2.
BEGIN failed--compilation aborted at lib/utf8_heavy.pl line 2.
Compilation failed in require at lib/utf8.pm line 17.
| [reply] [d/l] [select] |
|
|
So the polarity reversed in the newer one: it works with strict but fails if not strict? My code was the exact opposite. But the reason for yours is even more bizzare. I suppose they only tested it with strict enabled?
| [reply] |
|
|
I'm told this has been fixed in patch 17928.
| [reply] |
(tye)Re: Unicode source code problem in 5.6.1
by tye (Sage) on Nov 18, 2002 at 17:41 UTC
|
Variables whose names begin with control characters are forced into main:: no matter what package you are in. This is how things like ${^TAINT} work (which is a variable named "\ctAINT" -- note that "\ct" is CTRL-T). This sounds like a simple bug where "control character" has been implemented as something like "not ' '..'~'" or "not /^[a-z_]/i".
Note that this bug does not require 'use utf8' as this code:
use strict;
my $ì= 10;
just uses plain 8-bit Latin1 and results in:
Can't use global $^= in "my", near "my $ì"
which also hints that I'm correct about the source of the bug since it reports the variable name as "$^=".
Update: Ah, a different bug with "unusual" variable names.
- tye | [reply] [d/l] [select] |
|
|
I suspected a control-character bug, and tried having the variable begin with a regular letter. same problem. And it's not "can't use a global in my", but a totally different error, which implies that it thinks the variable is being referenced, not defined!
I just tried another test, and it's not being forced into package main but can co-exist in other packages.
Since it goes away when I'm not strict, it seems like the bug is in recognising a usage before a definition; once passed that, it actually works OK.
| [reply] |