Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: What is the proper way to read non-ANSI data

by CountZero (Bishop)
on Sep 14, 2015 at 13:09 UTC ( [id://1141928]=note: print w/replies, xml ) Need Help??


in reply to What is the proper way to read non-ANSI data

What about windows-1252?

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

My blog: Imperial Deltronics
  • Comment on Re: What is the proper way to read non-ANSI data

Replies are listed 'Best First'.
Re^2: What is the proper way to read non-ANSI data
by freonpsandoz (Beadle) on Sep 15, 2015 at 04:31 UTC

    That also works for '·' but not for '–'.

      What is the code for this '–', both in the original and changed file?

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      My blog: Imperial Deltronics

        In the original program output, every character (including the 'centered dot' chr(0xb7) ) is encoded as a single byte, except the specific hyphen like character your ask about, which is encoded as 3 bytes: e2 80 93.

        Which to me suggests that the output is utf-8. Update: Corion points out that text containing single bytes > 0x7f and 3-byte chars isn't utf-anything; but rather a mixed(-up) encoding.

        I suspect that the 'wrongness' the OP perceives when he treats the perl input stream as utf-8 and writes his output file as utf-8, has more to do with how he subsequently is inspecting that output than it does with Perl's handing of the data; but am insufficiently versed in the subject to be able to confirm that suspicion.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
        In the absence of evidence, opinion is indistinguishable from prejudice.
        I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1141928]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (5)
As of 2024-04-19 14:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found