The problem is that the pathname "öa" is not utf-8. It is extended (8-bit) ascii. presumably created via the command line. To demonstrate this, try the following one liner.
P:\test>perl58 -le"$f=qq[test-\xf6\xf6\xf6-test]; print $f; open F,'>' +,$f; print F 'this is the file'; close F;" test-÷÷÷-test P:\test>dir test-* Volume in drive P is Winnt Volume Serial Number is D822-5AE5 Directory of P:\test 02/09/03 14:26 18 test-ööö-test 1 File(s) 18 bytes 1,098,700,800 bytes free P:\test>type test-* test-ööö-test this is the file P:\test>
The character first character in "öa" is (extended) ascii 246 decimal (0xf6). This is an illegal character in utf-8. When perl attempts to treat a string containing this as utf, it see's the first character as the first byte of a two byte utf character and inspects the next byte to form the complete character. However, the next byte 'a' ascii 97 (0x61) is not a valid continuation byte for utf, hence the error message.
P:\test>perl -le"use utf8; $x = qq[öa]; print $x;" Malformed UTF-8 character (unexpected non-continuation byte 0x61, imme +diately after start byte 0xf6) at -e line 1. a
Here you can see that with use utf8 in force, the error occurs. Unsurprising, as the data is not correctly formed utf. The solution is to disable utf when dealing with extended ascii as can be seen here.
P:\test>perl -le"no utf8; $x = qq[öa]; print $x;" ÷a
In other words. You will probably bypass the problem by disabling utf8 by placing no utf8; at the top of your program.
The reason that the output (from perl) displays differently to the input--as '÷a' rather than 'öa'-- is something to do with ascii code to glyph mapping, ie. codepage settings (I think). If anyone has a better or fuller explanation of this, I'd like to hear it also.
NOTE: Whether the characters in this post will show up correctly when displayed in your browser will depend upon your browser and it's handling of character encoding. It looks fine in Opera 6.1 with the encoding set to "automatic", but I have seen it before where stuff looks fine for me, but shows up as "wierd characters" in other browser or with different settings.
In reply to Re: Unicode (�, �, � in German) Problem with File::Find under Windows2000
by BrowserUk
in thread Unicode (ä, ö, ü in German) Problem with File::Find under Windows2000
by TeddyC
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |