in reply to Character encoding fun...

Are you sure that you are using decode('cp1215', encode('utf8', ...)? I'm not sure it ever makes sense to do that.

I think you want:

encode('cp1252', decode('utf8', $my_utf_data))

Replies are listed 'Best First'.
Re^2: Character encoding fun...
by joem (Initiate) on Nov 15, 2007 at 20:41 UTC
    Hello,
    Thanks for the quick response.
    I though that's what I wanted though when I do that I get:
    Cannot decode string with wide characters at C:/Perl588/lib/Encode.pm +line 166.

    which is why it's turned around.

    Joe
      Your problem is that $my_utf_data contains code points (numbers representing Unicode characters), not octets (i.e. bytes).

      If $my_utf_data really contains bytes, no character in that string should be > 255. The error message you are getting indicates that there are characters > 255 in your string.

      If $my_utf_data is really text (i.e. consists of code points), then all you need is the call to encode to get a cp1252 encoded stream of bytes:

      encode('cp1252', $my_utf_data)
        So the way is starting to seem a little clearer.
        To take the following path

        UTF_String -> CP1252 data -> UTF_String
        Source      -> Storage       -> Display

        I will need to
        encode('cp1252', $my_utf_data) -> store -> decode('utf8', $retrieved_d +ata)
        though that does not provide any results per se.

        I must admit I am very new in the character encoding scene, It would be nice to be able to just manipulate the strings as byte arrays for storage and retrieval though I'm not sure how to accomplish that in perl :(

        Thanks,
        Joe