in reply to Matching UTF8 Regexps
You should review the manual for Encode::Guess more carefully, and see if you are trying to get something out of it that it can't really provide. Anyway, I think this is how you should be assigning the decoded value to $normalized:
Note that "decode" returns utf8 data -- you don't need to "re-encode" it as utf8. (updated to fix grammar)my $encoding = guess_encoding( $response->content ); my $normalized = decode( $encoding, $response->content );
<update> Actually, looking at the current man page for Encode::Guess on CPAN, it looks like you should be doing this:
In other words, the guessing method is supposed to return an object that supplies the (hopefully) appropriate decoding method, and you just pass your data to that method. </update>my $decoder = guess_encoding( $response->content ); die $decoder unless ( ref $decoder ); my $normalized = $decoder->decode( $response->content );
If that doesn't help, I'm not sure what else to suggest. Maybe if you try to break the process down to steps: store the $response->content to a file and inspect that manually; see what guess_encoding is returning for the chosen content -- maybe it's not guessing correctly; use the "FB_CROAK" flag as a third parameter in the "decode()" call (and wrap the call in an eval to catch it if it dies), to see if there are any errors when trying to convert the content to utf8, even when you know the "true" encoding of the source.
(What? You don't think there would be encoding errors in the original character data? Don't be so sure.)
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Matching UTF8 Regexps
by lestrrat (Deacon) on Mar 08, 2005 at 04:06 UTC | |
by graff (Chancellor) on Mar 08, 2005 at 04:31 UTC | |
by lestrrat (Deacon) on Mar 08, 2005 at 07:34 UTC | |
by graff (Chancellor) on Mar 08, 2005 at 14:31 UTC |