Re^3: Global substitution of non-base-plane Unicode characters

Replies are listed 'Best First'.
Re^4: Global substitution of non-base-plane Unicode characters by Jim (Curate) on Feb 24, 2014 at 04:00 UTC
In this case, using `printf` instead of `print` is justified and, in fact, smart. The default value of the predefined variable `$\` (`$OUTPUT_RECORD_SEPARATOR`) is undef, which is what Peter wants and expects here. But a very surprising and potentially elusive bug can be introduced into Peter's program when the value of `$\` is later changed. Using `printf` in this admittedly unusual way ensures that the Unicode byte order mark is never followed by any unexpected character such as newline (`\n`). Jim	[reply] [d/l] [select]
Re^5: Global substitution of non-base-plane Unicode characters by kcott (Archbishop) on Feb 24, 2014 at 04:29 UTC
If "the value of `$\` is later changed" was a genuine concern, a better way would be to explicitly code the following rather than expecting a subsequent maintainer to automatically realise why `printf` was used here: `... { local $\; print "\x{FEFF}"; } ...` [download] And, of course, a much better way to change `$\` in the middle of the program, would be along these lines: `... code as it is now ... # later changes: ... { local $\ = "\n"; ... code using changed $\ ... } ...` [download] -- Ken	[reply] [d/l] [select]
Re^6: Global substitution of non-base-plane Unicode characters by pjfarley3 (Initiate) on Feb 24, 2014 at 04:44 UTC
I like your solution making the printing of the BOM a local block better than my use of printf, thank you. I will use that. Peter	[reply]
Re^6: Global substitution of non-base-plane Unicode characters by Jim (Curate) on Feb 24, 2014 at 17:43 UTC
TIMTOWTDI. In Perl, printing exactly one character—a Unicode byte order mark—and nothing else is a special case of formatted printing, vis-à-vis generalized printing of lines of text with built-in programming conveniences (e.g., automatic newline handling). Would you find this troublesome? `printf '%s', "\N{U+FEFF}";` [download] Or this? `printf '%c', 0xfeff;` [download] Jim	[reply] [d/l] [select]
Re^7: Global substitution of non-base-plane Unicode characters by kcott (Archbishop) on Feb 25, 2014 at 00:30 UTC
Re^4: Global substitution of non-base-plane Unicode characters by pjfarley3 (Initiate) on Feb 24, 2014 at 02:56 UTC
Thanks for reminding me of that. My purpose there was to avoid the automatic "\n" appended by print, so that the UTF-8 BOM is just the first thing written to the output. Peter	[reply]
Re^5: Global substitution of non-base-plane Unicode characters by kcott (Archbishop) on Feb 24, 2014 at 03:34 UTC
"My purpose there was to avoid the automatic "\n" appended by print, ..." print does not do this. From its documentation: "The current value of `$\` (if any) is printed after the entire LIST has been printed." `$\` is the output record separator. From the "perlvar: Variables related to filehandles" documentation: "The output record separator for the print operator. If defined, this value is printed after the last of print's arguments. Default is `undef`." I can't see anywhere in the code you posted that you have explicited defined `$\` (e.g. `$\ = "\n"`). Check your shebang line (not shown in any of your posted code) for a "`-l`" switch. This is probably the most likely cause of "the automatic "\n" appended by print". See "perlrun: Command Switches" for details of the "`-l`" switch. -- Ken	[reply] [d/l] [select]
Re^6: Global substitution of non-base-plane Unicode characters by pjfarley3 (Initiate) on Feb 24, 2014 at 04:37 UTC
Aha! An earlier version of my test program did include declaring $s\ and setting its value to "\n", so that was the source of my mistaken impression that print was adding newline on its own. Another notch in my understanding of perl, thank you. Peter	[reply]