in reply to Re: The Unicode Bug with Transliteration or Substitution
in thread The Unicode Bug with Transliteration or Substitution
Update: I ran the process via strace on both machines. One of the many differences I noticed was the size of the read buffer: on the 32 bit machine, read(3 is called with the buffer size of 32768, while on the 64 machine, the size is 65536. There might be a problem if a multibyte character is split between two subsequent buffers. It would also explain why the output is not different when the input is processed line by line (no line is longer than 32768 bytes). It still doesn't explain why substitution fixes the problem, though.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: The Unicode Bug with Transliteration or Substitution
by graff (Chancellor) on May 05, 2014 at 02:26 UTC | |
by choroba (Cardinal) on May 14, 2014 at 20:40 UTC |