You talk about characters - when using UTF-8, length and substr count characters, not octets, so in the output, you can find more than 21 octets. If you were already talking about characters, not octets, can you please show some short example input that exhibits the problem, preferrably together with a hexdump of the relevant portion of the file?
In reply to Re^2: Read and write UTF-8
by Corion
in thread Read and write UTF-8
by Norah
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |