in reply to Re: unicode version of readdir
in thread unicode version of readdir

but it is far from trivial to implement that without introducing evil hacks ... I guess there would be opposition from non-windows folks,

You are correct. And in common with many other caveats of using Perl on Win32, if you chose to follow through on this, you are in for a very tough time.

It is unfortunately the case that the innate, knee-jerk, anti-MS reaction to anything that might improve the lot of the win32-based Perl user will be negative. Unless you can demonstrate that there is absolutely no negative impact of your code upon *any other OS user*, anywhere, anytime...your patch is likely to be rejected.

Of course, there will likely be a few attempts by *nix users to refute this allegation. They will say that patches from win32 users are treated exactly the same as those from *nix users, and only rejected if they are not complete and thorough. They will, if pushed, explain the ridiculously high rejection rate as a symptom that win32 users and programmers are simply too stupid to produce high quality patches.

You may even get a rejection of this thesis from the 2 tame win32 developers that have been accepted into the fold. You know. Like the token black man in US films from the 1960s through the 1980s. Do not be fooled or appeased.

Or else, they will simply stay silent and hope that nobody notices.

Good luck.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^3: unicode version of readdir
by demerphq (Chancellor) on Sep 17, 2007 at 17:35 UTC

    The problem with this particular issue (as ive said elsewhere) is that the interface for the routines (which ARE completely pluggable) does not include a way to pass the fact that the strings are unicode back to the calling code. They are all based around crude UNIX style char * interfaces.

    So in this case I wouldnt expect the kind of thing you are referring to to come up, it will just be such a big job with such huge ramifications that it wont happen until 5.12 at least. :-(

    I guess im one of the tame win32 users. Although anybody that knows me well knows that 'tame' is not the best description. ;-)

    ---
    $world=~s/war/peace/g

      Rejoice! I've found an easy way to pass that info. I'm working on the implementation, will post to p5p when ready, but the idea is to add int PL_dir_unicode that will contain set of unicode flags. The idea is that syscalls that are unaware of these hints, will keep the default behavior, and new code will differentiate between bytes and unicode semantics.

      The new unicode semantics is proposed along, that when compiled in has the following consequences. A) filename functions (as open, stat etc) have a chance to behave differently when passed unicode scalars b) after binmode(DIRHANDLE, ':utf8'), readdir will return unicode scalars, if appropriate. An interesting consequence of that would be that even on unicode-unaware OSes, readdir() will also return unicode scalars, without touching any system-specific code -- which I think is really cool.

      The prototype is:

      #define DIRf_HINT_WANT_UTF8_RESULT 1 #define DIRf_HINT_PARAM1_IS_UTF8 2 #define DIRf_HINT_PARAM2_IS_UTF8 4 #define DIRf_RESULT_IS_UTF8 8 #define DIRf_RESULT_IS_BYTES 16 #ifdef UTF8_FILENAME_SEMANTICS #define SET_DIR_UTF8_HINTS(flags) PL_dir_unicode = (flags) #define isDIR_RESULT_WANTED_AS_UTF8 ((PL_dir_unicode) & DIRf_HINT_W +ANT_UTF8_RESULT) #define isDIR_PARAM_UTF8 ((PL_dir_unicode) & DIRf_HINT_PARAM1_I +S_UTF8) #define isDIR_PARAM2_UTF8 ((PL_dir_unicode) & DIRf_HINT_PARAM2_ +IS_UTF8) #define PERLIO_UTF8_CHECK_RESULT(sv) \ if ( PL_dir_unicode & DIRf_RESULT_IS_UTF8) { \ SvUTF8_on((sv)); \ } else if ( !(PL_dir_unicode & DIRf_RESULT_IS_BYTES)) { \ STRLEN len; \ const char * const s = SvPV(sv,len); \ if (is_utf8_string((const U8*)s,len)) { \ SvUTF8_on((sv)); \ } \ } \ #else #define SET_DIR_UTF8_HINTS(flags) #define isDIR_RESULT_WANTED_AS_UTF8 0 #define isDIR_PARAM_UTF8 0 #define isDIR_PARAM2_UTF8 0 #define PERLIO_UTF8_CHECK_RESULT(sv) #endif #define PERLIO_UTF8_CONTEXT(u1) \ SET_DIR_UTF8_HINTS((u1) ? DIRf_HINT_PARAM1_IS_UTF8 : 0) #define PERLIO_UTF8_CONTEXT2(u1,u2) \ SET_DIR_UTF8_HINTS( \ ((u1) ? DIRf_HINT_PARAM1_IS_UTF8 : 0) | \ ((u2) ? DIRf_HINT_PARAM2_IS_UTF8 : 0)) #define PERLIO_UTF8_CONTEXT_FROM_SV(sv) \ PERLIO_UTF8_CONTEXT(SvUTF8(sv)) #define PERLIO_UTF8_CONTEXT_FROM_SV2(sv1,sv2) \ PERLIO_UTF8_CONTEXT(SvUTF8(sv1),SvUTF8(sv2)) #define PERLIO_UTF8_CONTEXT_RETURN(ret) \ SET_DIR_UTF8_HINTS((ret) ? DIRf_RESULT_IS_UTF8 : 0) #define PERLIO_UTF8_CLEAR_CONTEXT \ SET_DIR_UTF8_HINTS(0)

      caller code: (pp_stat, for example):

      PERLIO_UTF8_CONTEXT_FROM_SV(sv); if (PL_op->op_type == OP_LSTAT) PL_laststatval = PerlLIO_lstat(SvPV_nolen_const(PL_statname), +&PL_statcache); else PL_laststatval = PerlLIO_stat(SvPV_nolen_const(PL_statname), & +PL_statcache); PERLIO_UTF8_CLEAR_CONTEXT;

      and implementation of win32_stat in win32.c:

      BOOL do_utf8 = isDIR_PARAM_UTF8 && IsWin2000(); ... if ( do_utf8) { WCHAR buf[MAX_PATH+1]; l = MultiByteToWideChar(CP_UTF8, 0, path, -1, buf, MAX_PATH+1) +; path = (char*) PerlDir_mapW(buf); } else { path = PerlDir_mapA(path); l = strlen(path); } ... res = do_utf8 ? wstat(( WCHAR*) path, sbuf) : stat(path, sbuf);
      This way the new functionality won't have any effect on platforms without utf8 filenames. Early criticism is welcome.
Re^3: unicode version of readdir
by Anonymous Monk on Sep 16, 2007 at 03:21 UTC
    geez man. what is your problem?
    do you have an inferiority complex, or suffer from paranoid delusions