[5.30] What counts as a Turkic locale?

daxim has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: [5.30] What counts as a Turkic locale? by Discipulus (Canon) on May 14, 2019 at 15:10 UTC
Hello daxim, Turkic-peoples dedicated locales, for example `sah_RU` for Yakut. Search also other places for new locales. Probably all in: bashkir, chuvash `cv_RU` , kalmyk, komi, mari-el, ossetian `os_RU` , udmurt and yakut `sah_RU` You can now parse İ `U+0130` and ı `U+0131` safely. L* There are no rules, there are no thumbs.. Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.	[reply] [d/l] [select]
Re^2: [5.30] What counts as a Turkic locale? by RMGir (Prior) on May 14, 2019 at 16:36 UTC
Shouldn't Turkish (tr_TR) be in your all in list? It seems like it would be the canonical Turkic language, and it does have the various 'i' rules under discussion. Mike	[reply]
Re^2: [5.30] What counts as a Turkic locale? by daxim (Curate) on May 15, 2019 at 06:31 UTC
I want to know it exactly. Where in Perl is the relevant code? I imagine it's a list of hard-coded locale names.	[reply]
Re^3: [5.30] What counts as a Turkic locale? by RMGir (Prior) on May 15, 2019 at 12:27 UTC
I don't think it uses an explicit list of locale names. It looks like it's detected via toupper/tolower misbehaviour. Clone the git repo (git://perl5.git.perl.org/perl.git) then look at the diff for the commit in question: ~/git/perl$ git diff 30d8090de81085bd3dff00c83a7ab6d3ff8dfc8d^! diff --git a/locale.c b/locale.c index 383b2137c0..07e5525c10 100644 --- a/locale.c +++ b/locale.c @@ -1507,6 +1507,7 @@ S_new_ctype(pTHX_ const char newctype) / Don't check for problems if we are suppressing the warnings / bool check_for_problems = ckWARN_d(WARN_LOCALE) \|\| UNLIKELY(DEBUG +_L_TEST); + bool maybe_utf8_turkic = FALSE; PERL_ARGS_ASSERT_NEW_CTYPE; @@ -1523,6 +1524,14 @@ S_new_ctype(pTHX_ const char newctype) * handle this specially because of the three problematic code po +ints / if (PL_in_utf8_CTYPE_locale) { Copy(PL_fold_latin1, PL_fold_locale, 256, U8); + + / UTF-8 locales can have special handling for 'I' and 'i' if + they are + * Turkic. Make sure these two are the only anomalies. (We +don't use + * towupper and towlower because they aren't in C89.) / + if (toupper('i') == 'i' && tolower('I') == 'I') { + check_for_problems = TRUE; + maybe_utf8_turkic = TRUE; + } } / We don't populate the other lists if a UTF-8 locale, but do ch +eck that @@ -1668,7 +1677,18 @@ S_new_ctype(pTHX_ const char newctype) } } + if (bad_count == 2 && maybe_utf8_turkic) { + bad_count = 0; + bad_chars_list = '\0'; + PL_fold_locale['I'] = 'I'; + PL_fold_locale['i'] = 'i'; + PL_in_utf8_turkic_locale = TRUE; + DEBUG_L(PerlIO_printf(Perl_debug_log, "%s:%d: %s is turki +c\n", + __FILE__, __LINE__, +newctype)); [download] Mike	[reply] [d/l]
Re^4: [5.30] What counts as a Turkic locale? by daxim (Curate) on May 15, 2019 at 15:28 UTC