in reply to Re: Problem displaying unicode for certain websites
in thread Problem displaying unicode for certain websites

I'm a little confused. A unicode string stores a set of bytes internally and these bytes represent a set of characters. One character might need a number of bytes within this internal representation. An ascii string is the same idea except that only a single byte is needed to represent a character. But how do I know if a given variable stores a unicode or ascii string? Am I right in saying that if the get() function is given a unicode string as argument that it will return a unicode string? This wouldn't mean that my svd string is in ascii and my expressen string is in unicode and that doesn't make any sense to me. Please help!
  • Comment on Re^2: Problem displaying unicode for certain websites

Replies are listed 'Best First'.
Re^3: Problem displaying unicode for certain websites
by ikegami (Patriarch) on Dec 12, 2009 at 11:10 UTC

    But how do I know if a given variable stores a unicode or ascii string?

    It contains what you put in it. What did you put in it?


    You have strings of (Unicode) characters and strings of bytes.

    If the string contains chr(0x2660), it's obviously not a string of bytes. If the string contains chr(0x41), it could be anything. ASCII 'A', the number 65, or something completely different.

    If you pass a string with chr(0x41) in it to a function, you're not gonna get much information out of it. What you do is pass a string with something that can't be a byte in it. If it works, you know it's expecting characters.

      Thanks ikegami! Your code gives me -
      Oj, f\x{00e5}r vi ingen mat?!
      instead of -
      Oj, får vi ingen mat?!
      How come?
        I updated all my modules and it seems the line -
        $decoded_text = decode_entities($decoded_text);
        isn't doing anything. I'm getting the following output for "www.expressen.se" with your code -
        Spela Uno! Det klassiska kortspelet i digital form. Redo att byta jobb? H\x{00e4}r kan du s\x{00f6}ka bland m\x{00e4}ngder av annonser. L\x{00f6}rdag 12 december 2009 Tipsa Expressen
Re^3: Problem displaying unicode for certain websites
by Anonymous Monk on Dec 12, 2009 at 10:58 UTC
    Ah, OK thanks!
      One last question. Should I be using alternatives to these buggy modules?