in reply to Text file to UTF-8 encoding

Depending on your input text, a UTF-8 encoding could easily not increase the file size by much. I believe a two-byte indicator must be added at the beginning of the file; but the common ASCII characters from 0-127 are UTF-8 encoded as-is.

Of course, you do have to actually modify your data to get any difference (see previous replies).


sas

Replies are listed 'Best First'.
Re^2: Text file to UTF-8 encoding
by massa (Hermit) on Jul 31, 2008 at 15:04 UTC
    believe a two-byte indicator must be added at the beginning of the file;
    s/two-byte/three-byte/; s/must /, the codepoint 0xFEFF (in UTF-8, "\xEF\xBB\xBF"), can/;
    []s, HTH, Massa (κς,πμ,πλ)
Re^2: Text file to UTF-8 encoding
by moritz (Cardinal) on Jul 31, 2008 at 15:09 UTC
    much. I believe a two-byte indicator must be added at the beginning of the file

    I believe you are referring to the Byte Order Mark, which is by no means mandatory. It is used for UTF-16 and UTF-32 because there endianess matters.

    And the byte order mark in UTF-8 is three bytes (EF BB BF), not two.