The problem is that the pathname "öa" is not utf-8. It is extended (8-bit) ascii. presumably created via the command line. To demonstrate this, try the following one liner.

P:\test>perl58 -le"$f=qq[test-\xf6\xf6\xf6-test]; print $f; open F,'>' +,$f; print F 'this is the file'; close F;" test-÷÷÷-test P:\test>dir test-* Volume in drive P is Winnt Volume Serial Number is D822-5AE5 Directory of P:\test 02/09/03 14:26 18 test-ööö-test 1 File(s) 18 bytes 1,098,700,800 bytes free P:\test>type test-* test-ööö-test this is the file P:\test>

The character first character in "öa" is (extended) ascii 246 decimal (0xf6). This is an illegal character in utf-8. When perl attempts to treat a string containing this as utf, it see's the first character as the first byte of a two byte utf character and inspects the next byte to form the complete character. However, the next byte 'a' ascii 97 (0x61) is not a valid continuation byte for utf, hence the error message.

P:\test>perl -le"use utf8; $x = qq[öa]; print $x;" Malformed UTF-8 character (unexpected non-continuation byte 0x61, imme +diately after start byte 0xf6) at -e line 1. a

Here you can see that with use utf8 in force, the error occurs. Unsurprising, as the data is not correctly formed utf. The solution is to disable utf when dealing with extended ascii as can be seen here.

P:\test>perl -le"no utf8; $x = qq[öa]; print $x;" ÷a

In other words. You will probably bypass the problem by disabling utf8 by placing no utf8; at the top of your program.

The reason that the output (from perl) displays differently to the input--as '÷a' rather than 'öa'-- is something to do with ascii code to glyph mapping, ie. codepage settings (I think). If anyone has a better or fuller explanation of this, I'd like to hear it also.

NOTE: Whether the characters in this post will show up correctly when displayed in your browser will depend upon your browser and it's handling of character encoding. It looks fine in Opera 6.1 with the encoding set to "automatic", but I have seen it before where stuff looks fine for me, but shows up as "wierd characters" in other browser or with different settings.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
If I understand your problem, I can solve it! Of course, the same can be said for you.


In reply to Re: Unicode (�, �, � in German) Problem with File::Find under Windows2000 by BrowserUk
in thread Unicode (ä, ö, ü in German) Problem with File::Find under Windows2000 by TeddyC

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.