in reply to WWW::Mechanize & encoding

WWW::Mechanize already decodes the character encoding (here: EUC-JP) implicitely. Jcode is from the Perl4 era, always use the Encode module instead. This works:
use WWW::Scripter qw(); use HTML::Entities qw(decode_entities); my $w = WWW::Scripter->new; $w->get('file:///tmp/nhg.euc-jp.html'); decode_entities $w->content; # returns a Perl string, use Encode::enco +de to prepare it for output.

Replies are listed 'Best First'.
Re^2: WWW::Mechanize & encoding
by Anonymous Monk on Jun 28, 2011 at 13:42 UTC
    Thanks, but I seem to have hit a snag w/ Encoder
    use Encode::Encoder qw(encoder) HTML::Entities qw(deocde_entities); my $w = WWW::Scripter->new; $w->get("file:///tmp/nhg.euc-jp.html") decode_entities $w->content; # this is okay I think my $euc = encoder( $w->content )->euc_jp; # this gives an error
    I assume my syntax for encoder() is wrong:
    "\x{00bd}" does not map to euc-jp at /usr/lib64/perl5/vendor_perl/5.12.2/x86_64-linux/Encode/Encoder.pm line 88.

    Shouldn't have started drinking so early..

      You're doing it wrong because you are not paying attention and you are writing sloppy code.

      I said in my code comment decode_entities returns something. In your code, you discard the return value, but you need to assign it to a variable or put it as parameter for a function if you want make use of it.

      The function is named not deocde_entities, but decode_entities.

      Your code is lacking two ; statement separators.

      use WWW::Scripter qw(); use HTML::Entities qw(decode_entities); use Encode qw(encode); my $w = WWW::Scripter->new; $w->get('file:///tmp/nhg.euc-jp.html'); print encode('EUC-JP', decode_entities($w->content)); # output octets +go to STDOUT, encoded as EUC-JP