in reply to Re^3: What is the proper way to read non-ANSI data
in thread What is the proper way to read non-ANSI data
In the original program output, every character (including the 'centered dot' chr(0xb7) ) is encoded as a single byte, except the specific hyphen like character your ask about, which is encoded as 3 bytes: e2 80 93.
Which to me suggests that the output is utf-8. Update: Corion points out that text containing single bytes > 0x7f and 3-byte chars isn't utf-anything; but rather a mixed(-up) encoding.
I suspect that the 'wrongness' the OP perceives when he treats the perl input stream as utf-8 and writes his output file as utf-8, has more to do with how he subsequently is inspecting that output than it does with Perl's handing of the data; but am insufficiently versed in the subject to be able to confirm that suspicion.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^5: What is the proper way to read non-ANSI data
by freonpsandoz (Beadle) on Oct 04, 2015 at 00:39 UTC |