On several machines at work, we run Perl 5.8.3 (yes, I know it's 10 years old; not my choice). We noticed a strange behaviour recently: we used
tr/ / /s;
to process some HTML files. If the files contained non-latin characters (e.g. Chinese), on some machines the output was garbled. We tried to replace tr with substitution
s/ +/ /g;
and suddenly, the output was correct.
Both input and output are marked with :encoding(utf-8). The files must be slurped in to trigger the bug, line-by-line processing produces the correct output.
Could this be one of the manifestations of The "Unicode Bug"? I have the gut feeling that the substitution might solve the problem for the given file, but the bug could reappear with the next different file. I also don't understand why the bug only appeared on some machines - the version of Perl is the same on all of them (but their Linux version is different). Is any external library involved in transliteration, substitution, or unicode handling?
BTW: I wasn't able to install 5.8.3 at home (errors during make) to test further. Update: I was able to install it with the help of Devel::PatchPerl. I wasn't able to replicate the problem, though.
In reply to The Unicode Bug with Transliteration or Substitution by choroba
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |