in reply to question about Encode::decode('iso-8859-1', ...)
But as hinted by massa above, any scalar with the utf8 flag turned on will cause the script to die with a run-time error:
because you cannot "decode()" a string into perl-internal utf8 if it is already flagged as being perl-internal utf8.Wide character in subroutine entry at /.../Encode.pm line ...
There does seem to be some suggestion of discrepancy between the Encode man page and the behavior of "eq" and "ne"; the man page says:
...to convert ISO−8859−1 data to a string in Perl’s internal format:$string = decode("iso−8859−1", $octets);
CAVEAT: When you run "$string = decode("utf8", $octets)", then $string may not be equal to $octets. Though they both contain the same data, the utf8 flag for $string is on unless $octets entirely consists of ASCII data (or EBCDIC on EBCDIC machines).
(Update: thanks to almut for catching/explaining how I misread this point.)
But the following script (when run with perl 5.8.8 on darwin) shows that the flag setting seems to have no effect on "eq" for the characters in question (the "high table" portion of 8859-1) -- every output line says "(flag diff...) decoding ... makes no difference":
So I wonder whether there are any perl versions or installations where the caveat actually applies to "eq" and "ne", or whether there is some other comparison operator on my version/installation that would catch the difference in the flag setting.#!/usr/bin/perl use Encode qw/encode decode is_utf8/; for my $scalar ( map { encode( 'iso-8859-1', chr( $_ )) } 0xa0 .. 0xff + ) { printf( "decoding %s makes %s difference\n", $scalar, ( test( $scalar ) ? "no" : "some sort of" )); } sub test { my $x = shift; my $y = Encode::decode('iso-8859-1', $x); print "(flag diff...) " if ( is_utf8( $x ) ne is_utf8( $y )); if ($x eq $y) { return 1; } else { return 0; } }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: question about Encode::decode('iso-8859-1', ...)
by almut (Canon) on Mar 07, 2009 at 16:58 UTC | |
|
Re^2: question about Encode::decode('iso-8859-1', ...)
by ikegami (Patriarch) on Mar 07, 2009 at 22:45 UTC | |
by graff (Chancellor) on Mar 08, 2009 at 04:05 UTC | |
by ikegami (Patriarch) on Mar 08, 2009 at 04:53 UTC |