GaijinPunch has asked for the wisdom of the Perl Monks concerning the following question:

Ages ago I found out that Mechanize 1.49 (I think) and up force utf8 when it decodes data. I rolled back and all was well. I'm trying to run that script again (on another machine) and am getting similar behavior (even with old versions). It seems any version Mechanize *sends* is utf8 (or something other than what I'm trying to send: euc-jp). I'm only running get(), submit_form() and content() on my object. Even so, when I send something, it's mojibake on the other end. Is there some dependency of Mechanize that's gumming up the works possibly?

I have an EUC hash, so a rather lame, but possible solution would be to send the raw euc codes for the characters... I don't think this would work though if mechanize is going to tell the server it's coming in utf8 though.

Thoughts?

Replies are listed 'Best First'.
Re: WWW::Mechanize (charset mayhem)
by marto (Cardinal) on Dec 03, 2009 at 14:54 UTC

    Have you tried this with the current version of WWW::Mechanize? IIRC recently there has been work done regarding how it determines encoding.

    Martin

Re: WWW::Mechanize (charset mayhem)
by WizardOfUz (Friar) on Dec 03, 2009 at 17:22 UTC

    WWW::Mechanize uses HTML::Form internally. Take a look at the documentation for that module's parse() and accept_charset() methods.

      Thanks for the replies. I crashed pretty quick last night.

      >>marto: I've tried all kinds of versions. There's nothing about charset or encoding in 1.60's documentation. >> wizard: Thanks. I've looked around a bit but haven't been able to come up with anything. Searching around, I found a rather lengthy post here: http://code.google.com/p/www-mechanize/issues/detail?id=61&can=1&q=charset

      . I also installed LWP and whatnot specifically, which seems to have broken it further. I get this now when using Mechanize-1.34:
      Can't locate object method "decoded_content" via package "HTML::Form" at /usr/lib/perl5/site_perl/5.8.8/HTML/Form.pm line 145.

      1.60 doesn't give this error, but decodes everything to utf8 specifically. :(

        "There's nothing about charset or encoding in 1.60's documentation"

        Read Changes, back to the notes for version 1.56.

        Do you have a small working example I can play with?