in reply to possible missunderstanding of package Encode

Your understanding of Encode is correct, your input of the original string is the issue. The expression 'Köln' will produce bytes in whatever encoding your script is in, not a decoded string. There are several ways to fix this:

1. Tell perl that all of your hard-coded strings are INPUT as utf8 (Note: I'm certain that your editor is set up for UTF-8 input from the output you received, but you should convince yourself of that too and then learn how to configure it)

use utf8; # All hard-coded strings will be assumed to be UTF-8 my $temp = encode( "iso-8859-1", 'Köln' ); ...

2. Tell perl that this one string was input as utf8 (again, it is UTF-8 because that is what your editor produces)

my $temp = encode( "iso-8859-1", decode("UTF-8", 'Köln') ); ...

The second case most closely resembles what happens when you process a file or command-line arguments:

# Files (change input encoding to match file encoding): open my $F, "<:encoding(UTF-8)", "myfile" or die "Error reading myfile +: $!"; my $line = <$F>; # $line contains a decoded string say encode( "iso-8859-1", $line ); # Command-Line args: my $arg = decode("UTF-8", $ARGV[0]); # Or, command-line args is an appropriate use of Encode::Locale use Encode::Locale; my $arg = decode("locale", $ARGV[0]);

Your output of "Köln(5)" tells us that your editor and your terminal are in UTF-8 encoding and $temp is double-encoded mojibake (just much less spectacularly obvious than usual mojibake).

Just keep in mind that once you decide to care about encoding: All input must be first decoded somehow (including strings input directly into program), then it must be encoded before output. If you find odd issues with encoding, ask where it was decoded and where it was encoded (and then ask yourself whether it was decoded or encoded twice).

Good Day,
    Dean

Replies are listed 'Best First'.
Re^2: possible missunderstanding of package Encode
by nikosv (Deacon) on Oct 20, 2015 at 11:35 UTC

    Note: I'm certain that your editor is set up for UTF-8 input from the output you received, but you should convince yourself of that too and then learn how to configure it

    I do confirm that:

    In Notepad++ Windows,when editor set to utf8 :

    use v5.10; use Data::Dumper; use Devel::Peek; print unpack "C*",'Köln'; Dump 'Köln'; 75 195 182 108 110 #note that ö has been mapped as two bytes extended ascii values 195 #a +nd 182 SV = PV(0x27c7b54) at 0x8eef84 REFCNT = 1 FLAGS = (PADTMP,POK,READONLY,pPOK) PV = 0x8f4724 "K\303\266ln"\0 CUR = 5 LEN = 8
    when editor set to iso-8859-1 :
    use v5.10; use Data::Dumper; use Devel::Peek; print unpack "C*",'Köln'; Dump 'Köln'; 75 246 108 110 #note that ö is represented with a single byte extended #ascii decimal + 246 SV = PV(0x2813394) at 0x20cef84 REFCNT = 1 FLAGS = (PADTMP,POK,READONLY,pPOK) PV = 0x20d4724 "K\366ln"\0 CUR = 4 LEN = 8
Re^2: possible missunderstanding of package Encode
by toohoo (Beadle) on Oct 20, 2015 at 11:41 UTC

    Dear Dean,

    you are my hero of the day!. This was just what I needed to handle the variable/input. I put this in my test-scipt and the output opened my eyes:

    #!/usr/bin/perl use v5.10; use Encode; use Data::Dumper; my $temp = encode( "iso-8859-1", 'Köln' ); say Dumper "========== encode string =========="; say $temp, "(", length($temp), ")"; my $VUOrt0 = 'Köln'; $temp = encode( "iso-8859-1", $VUOrt0 ); say Dumper "========== encode scalar variable =========="; say $temp, "(", length($temp), ")"; $temp = encode( "iso-8859-1", decode("UTF-8", $VUOrt0) ); say Dumper "========== decode encode scalar variable =========="; say $temp, "(", length($temp), ")"; if ( $temp =~ /ö/ ) { say "habe 'ö' gefunden"; } else { say "habe 'ö' +NICHT gefunden"; } if ( $temp =~ /\xF6/ ) { say "habe '\xF6' gefunden"; } else { say "hab +e '\xF6' NICHT gefunden"; } for ( my $i = 0; $i < length($temp); $i++ ) { say substr( $temp, $i, 1), "(", length(substr( $temp, $i, 1)), ")" +; }

    If you might run the script, you see, what i mean. To answer your assumption was right. I am working in a virtualbox with Ubuntu 13.10. My editor is geany and the default justification seams to be UTF-8. I checked this on several used scripts. The shell is simply the terminal.

    Many thanks and have a nice day, Thomas