.rhavin has asked for the wisdom of the Perl Monks concerning the following question:
Hi Monks;-)
After spending some night(mare)s with encoding madness, i finally got a nearly working solution for the following task:
i set up agent (package var $XAgent) like this:
# ----------------------------------------------------- sub _setAgent { $XAgent = LWP::UserAgent->new(keep_alive => 1); $XAgent->default_header('Accept-Charset' => 'ISO-8859-1,utf-8'); $XAgent->agent($ENV{'HTTP_USER_AGENT'}); $XAgent->cookie_jar({}); # allow cookies };
i get data like this:
# ----------------------------------------------------- sub _getUrl { return $XAgent->get(shift)->decoded_content(); };
i request XML::Parser to parse data like that:
# ----------------------------------------------------- my $xml = _getUrl($url); my $p = XML::Parser->new(Style => 'Stream', Pkg => 'some_pkg', ProtocolEncoding => "utf-8"); $p->parse($xml);
If I leave out the 'utf-8'-hint for XML::Parser, some non-ascii chars get screwed up, so I thought "alright, decoded_content() returns perl-friendly utf-8, so set that manually!" Works almost everytime. Almost. So im not quite shure if i'm right about that assumption.
So my questions are:
Any further enlightenment and - of course - hits how do things better/faster highly welcome
TIA, ~.rhavin;)
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: LWP::Agent vs. XML::Parser - the zillionth encoding madness question
by ikegami (Patriarch) on Jan 28, 2010 at 18:33 UTC |