comment on

The broken handling of Unicode on file system operations is not just a Windows issue. It is not done correctly in Unix/Linux either:

See how the UTF8 flag is completely ignored in scalars passed as arguments to built-ins performing file system operations in the following re.pl session:

salva@atun:/tmp/unicode$ re.pl
$ $a="a\xf1o"
a&#65533;o
$ $b = $a
a&#65533;o
$ use Devel::Peek

$ utf8::upgrade($b)
4
$ Dump $a
SV = PV(0x557576d4e2d0) at 0x557576812a80
  REFCNT = 1
  FLAGS = (POK,IsCOW,pPOK)
  PV = 0x557576d516c0 "a\361o"\0
  CUR = 3
  LEN = 10
  COW_REFCNT = 0

$ Dump $b
SV = PV(0x557576d4e2a0) at 0x557576d4b7d0
  REFCNT = 1
  FLAGS = (POK,pPOK,UTF8)
  PV = 0x557576d6e6e0 "a\303\261o"\0 [UTF8 "a\x{f1}o"]
  CUR = 4
  LEN = 10

$ open A, ">$a";
1
$ open B, ">$b";
1
$ system "ls"
año  a?o
0
$ $a eq $b
1
[download]

In reply to Re: RFC: system calls on Unicode filesystem by salva
in thread RFC: system calls on Unicode filesystem by daxim

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


There's more than one way to do things
	PerlMonks