in reply to Re: possible missunderstanding of package Encode
in thread possible missunderstanding of package Encode

Hello,

I might possibly have not expressed correctly. The first value that should be used is:

'Köln'

.. as written there in single quotes. There might or might not be a further assingment to the scalar variable from database or elsewhere. But the first assignment should work as good as the further. When Perl tells me, that the length is 5, then this is in my eyes not correct iso-8859-1 because in this case it should be only 4 characters. This means independent from what I have in this variable at runtime, the encode should transfer it to the ANSI or ASCII representation. And yes, I know that there is a difference beetween these two. But character 'ö' should be only one byte and not 2. I hope I did express more correctly now.

thanks

The last version of my test-script so far:

#!/usr/bin/perl use v5.10; use Encode; use Data::Dumper; my $temp = encode( "iso-8859-1", 'Köln' ); say Dumper "========== encode string =========="; say $temp, "(", length($temp), ")"; my $VUOrt0 = 'Köln'; $temp = encode( "iso-8859-1", $VUOrt0 ); say Dumper "========== encode scalar variable =========="; say $temp, "(", length($temp), ")";

Replies are listed 'Best First'.
Re^3: possible missunderstanding of package Encode
by Anonymous Monk on Oct 20, 2015 at 11:11 UTC

    I might possibly have not expressed correctly. The first value that should be used is: 'Köln'

    ;) Thats the exact value I used, all of the values produced by encode/decode in my program are exactly 'Köln', the latin1 and binary version and the utf8 version, they're all 'Köln'

    When Perl tells me, that the length is 5, then this is in my eyes not correct iso-8859-1 because in this case it should be only 4 characters.... say Dumper "========== encode string ==========";

    Why are you looking at "length" at all?

    You start with unknown bytes (either utf8 or latin1), perl treats it as bytes or latin1, whether its 4 or 5, it doesn't matter, its not a "unicode string" its a binary string or a latin1 string

    Then you encode this string to latin1 explicitly, now its bytes for sure, this time it makes no sense to look at length -- its the length of the bytes, whatever they are, since you don't know what you started with the new length doesn't matter

    Also , if you're going to Dumper anything it should be data, not banners

    I/O flow (the actual 5 minute tutorial)

      Thanks and short answer: I put the string in an XML and send this to the webservice on the other side. The webservice (which is requiering iso-8859-1) tells me that I have delivered 'Köln' instead of 'Köln' and that he is not able to identify this correctly.

      regards