I tried to use a UTF-8 non-breaking space (between day and name of month) in the format argument of POSIX::strftime, and hit (with Perl v5.32.0 and a UTF-8-encoded script file, without any non-default encoding settings) upon the following two oddities:
These behaviours can be demonstrated with the following script (The comments apply to the transparent space character in the format; the innocent-looking - inner, i.e. not syntactical - quotes in lines 3 and 4 are Unicode LEFT and RIGHT SINGLE QUOTATION MARK, the same as in the $string):
use POSIX qw(strftime); $string = 'hailed an über ‘cab’ on '; @t = (0, 0, 0, 23, 5, 2020, 4); print $string . strftime( '%d/%b', @t), "\n"; print $string . strftime( '%d %b', @t), "\n"; # UTF-8 nbsp print $string . strftime('‘%d %b’', @t), "\n"; # UTF-8 nbsp print $string . strftime('‘%d %b’', @t), "\n"; # ASCII space
This outputs (line numbers added):
1 hailed an über ‘cab’ on 23/Jun 2 hailed an über ‘cab’ on 23�Jun 3 hailed an über âcabâ on ‘23 Jun’ 4 hailed an über âcabâ on ‘23 Jun’
Note that
(I have deleted complaints about the wide characters in print for line 3 and 4 for brevity.)
I am guessing, rather vaguely, that this is down to strftime essentially being the C function and the latter not being Unicode-aware and maybe also the way that Perl identifies how strings are encoded and then "upgrades" some so as to harmonise their encodings (in this case under a wrong assumption), but ... :
The behaviour with a non-breaking space alone vs. (also) other non-ASCII characters seems definitely inconsistent. Why is the behaviour different between the non-breaking space and typographical quotation marks, which are all outside the ASCII block?
Also, can anything be done about it, i.e. is it possible to use non-breaking spaces in a format for strftime such that they come out correctly (and without having to resort to inserting extra - likely unwanted - non-ASCII characters), and is it possible to use any non-ASCII character in those format argument without confusing Perl? (Actually, I can think only of non-breaking spaces as useful, but other cultures may very plausibly have other use cases.)
In reply to strftime does not handle Unicode characters in format argument properly (at least, not consistently) by Bruder Savigny
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |