in reply to Format eating too few characters with utf-8?

Did you try outputting the strings by some other means (e.g. print), rather than format? If other indicators say that the input data is hosed, then there could be something wrong with the command line interface. What shell are you doing this under? Is it really necessary to pass the string on the command line? Could you feed it in via stdin?

A word spoken in Mind will reach its own level, in the objective world, by its own weight
  • Comment on Re: Format eating too few characters with utf-8?

Replies are listed 'Best First'.
Re^2: Format eating too few characters with utf-8?
by adsb (Initiate) on Apr 02, 2008 at 06:05 UTC
    Yes, the data really has to be passed on the command line - this is a several year old and established script that I can't change the semantics of. ikegami's reply populated @ARGV inside the script rather than passing the text on the command line, which should avoid any issues there.

    print()ing the string before passing it to decode_utf8 gives "l'été sera chaud". Decoding it gives "l'�t� sera chaud".

    I've found a solution that seems to fix the problem, although it is rather hacky. Appending n spaces to the string before write()ing it, where n is the number of non-ASCII characters inside the string, seems to fix the issue without polluting the eventual output.