About the text written to the debug file: did it always come from the "$term->get_preferred()" and "$term->get_synonyms()" method calls, or did it just come directly from the input Japanese utf8 file? What is the nature of the "term" object that holds the Japanese data? (If the debug output didn't come from the object methods, you need to try it that way, but I assume you've already covered this.)
Even odder, the script reads an XML file that contains entries from other languages and writes those to the output XML file along with entries created from the Japanese text, and if I remove any entries from the input XML file, the corruption in the output file doesn't occur where it used to.
I guess that could make it hard to demonstrate the problem with minimal snippets of code and data. Still, if the problem really does depend on the input XML data in some way, you should study the initial, known-buggy output, and create a sample from the XML input file consisting of the entries adjacent to the problem, and limit the Japanese input to the entry or entries that are the problem. See if you can replicate the error with a minimal amount of data. (While you're at it, see if you can create a stripped-down version of the script, too -- just enough to produce the error. If the object pointed to by $term is big and hairy, that might be the place to start clipping.)
If that doesn't clarify the problem for you, at least you'll have a specific example that can be posted. BTW, I don't see any problems with the code in the original post, except that the object method calls leave a lot to the imagination.
(update: thinking about it a little more, the only sort of "typical" problem I can imagine that would create the symptom you describe would be any sort of improper mixing of output writing methods -- e.g. using both "syswrite" and "print" on the same output file handle -- and this could be producing other corruptions in the output that might not be as noticeable as the ones you're seeing.)
In reply to Re: utf-8 bytes dropped when printing to a file
by graff
in thread utf-8 bytes dropped when printing to a file
by moleary
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |