Update: the decode utf8 fix works for Linux and OS-X, but not win32 (surprise, surprise).

The problem is as before: the file path with extended chars in it is not matched by either of the -f or -d tests. Interestingly, console debug print complains about "wide" chars suggesting to me that the decode of the filepath HAS done its job. Printing the before/after state of the utf8 flag using Encode::is_utf8() seems to confirm this.

Adding the decode CHECK option causes a runtime error under win32 (utf8 "\xF4" does not map to Unicode at...) Snooping the string shows it contains "c3 B4" for the "smaller letter 'o' with circumflex" char under Linux and "F4" (the unicode equivalent) under win32. Using the "strict" encoding alias "UTF-8" makes no difference (Linux==good; win32=BAD).
$entry = decode('utf8', $entry, 1);

I'm using ActiveState perl 5.8.9 so I can generate binaries for all platforms; Encode is at 2.42 in all cases.

Followup! The "fix" for win32, if you can call it that, is MONSTROUS. Before each and every use of a file test (-f, -d, -e, etc), and each use of an open, move, copy etc, the following is required:

$filename = pack 'U0C*', unpack 'C*', filename;

This fixes runs under win32 and does no great harm under Linux/OS-X unless the name is going to be displayed in which case the original form must be used or the extended char will be shown as an undisplayable char.

There HAS to be a better way!


In reply to Re^4: File::List, UTF-8, and Tk by ron7
in thread File::List, UTF-8, and Tk by ron7

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.