Re: What is the proper way to read non-ANSI data

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

My blog: Imperial Deltronics

Comment on Re: What is the proper way to read non-ANSI data

Replies are listed 'Best First'.
Re^2: What is the proper way to read non-ANSI data by freonpsandoz (Beadle) on Sep 15, 2015 at 04:31 UTC
That also works for '·' but not for '–'.	[reply]
Re^3: What is the proper way to read non-ANSI data by CountZero (Bishop) on Sep 15, 2015 at 06:15 UTC
What is the code for this '–', both in the original and changed file? CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics	[reply]
Re^4: What is the proper way to read non-ANSI data by BrowserUk (Patriarch) on Sep 15, 2015 at 07:35 UTC
In the original program output, every character (including the 'centered dot' chr(0xb7) ) is encoded as a single byte, except the specific hyphen like character your ask about, which is encoded as 3 bytes: e2 80 93. ~~Which to me suggests that the output is utf-8.~~ Update: Corion points out that text containing single bytes > 0x7f and 3-byte chars isn't utf-anything; but rather a mixed(-up) encoding. I suspect that the 'wrongness' the OP perceives when he treats the perl input stream as utf-8 and writes his output file as utf-8, has more to do with how he subsequently is inspecting that output than it does with Perl's handing of the data; but am insufficiently versed in the subject to be able to confirm that suspicion. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :) In the absence of evidence, opinion is indistinguishable from prejudice. I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!	[reply]
Re^5: What is the proper way to read non-ANSI data by freonpsandoz (Beadle) on Oct 04, 2015 at 00:39 UTC


Your skill will accomplish what the force of many cannot
	PerlMonks