hishii2001 has asked for the wisdom of the Perl Monks concerning the following question:
Hi, I am trying to find a solution around unexpected behavior of the output with my Perl script.
I have a set of files in non-English language. This particular language uses non-breaking space (U+00A0) quite often instead of regular space. The use of non-breaking space character is intentional, and it is very important to use the character instead of normal space character for this particular language.
What my Perl script does is to change language code "en" for English to something else that is appropriate to the language. So, I'm simply using the regular expression to search for particular sequence of letter and replacing them to something else. That's all it does. Then the script saves the text in a text-based file. Script does this for thousands of files.
The problem I'm encountering is that when the original file has non-breaking space character (U+00A0), Perl processes the text, but saves the non-breaking space character as "\_"
I'm reading the original file as UTF-8 file because the files are saved in UTF-8 to handle non-ASCII characters. All non-ASCII characters used in the foreign language are handled correctly with correct accents, but only the non-breaking space character is converted to something else in the output file.
For example, if I have input text:
issue: "Problém s odesláním"The output text becomes as below:
issue: "Problém s\_odesláním"The space character before the "s" character is a normal space character (U+0020), but the space character after the "s" character is a non-breaking space character (U+00A0).
Does anybody have any idea how to save the non-breaking space as non-breaking space character in without converting to "\_"? I'm experiencing this issue only in Macintosh environment. If I use the same Perl code in Windows environment, I do not have this issue. I appreciate any input. Thank you in advance.
|
|---|