I'd love to see your function, if you can share it...
Here or by mail if you prefer (realbot |at| gmail |dot| com)
Thanks!
| [reply] |
Sorry, forgot to link to
problem with german chars in html post fetch
but I just went back and fixed (above).
I actually did two code postings in the thread because my approach to the problem evolved, and I judged it would make the original question harder to understand if I just updated... so to get the most knowledge about the weirdness I was encountering, I'd read through the whole thread.
But if you just want the function snipped, it was
# Takes a variable and spits it back out with the proper german charac
+ters
sub germanchars_to_strange_html_chars {
my $var = shift;
my %table = ( 'ß' => 'ß', 'ä' => 'ä', 'ö' => 'ö',
'Ä' => 'ä', 'Ö' => 'ö', 'Ü' => 'ü',
'ü' => 'ü');
while (my ($k,$v) = each %table) {
$var =~ s/$k/$v/g;
}
return $var;
}
I don't think this is a particular solution to the problems you were having in hungarian, just that my approach might be helpful.
Also, you might want to know, how did I determine the substitutions for the function?
I did the post request manually with firefox, and then did file->save as. The funky characters were then culled from the result. Truly a kludge, and I'm sure there's a better way to do it, but until I figure it out, that's what I'm left with.
thomas. | [reply] [d/l] |
To further distill what seems to be the problem, the crux is that CGI::enurl('börse') results in 'b%F6rse' thing for me, but 'börse' for holli, another perlmonk working with german characters, who tried to help me.
The result of this is that cgi post requests work for holli after cgi::enurl-ing, but they fail for me. Very un-dwimmy.
I suspect this has something to do with differences between the default encoding on my system, holli's system, and maybe realbot's system. And I also suspect this has to do with how the perl encode works. But I am at a loss for how to isolate this.
Hope this leads to a solution for both of us somehow...
thomas.
| [reply] |
Realbot, after hours of headscratching I finally found a solution to my problem, which I recorded at The problem was utf-8 versus windows ansi.
Glad you found a solution. For those running into this kind of issue with automated crawling via LWP calls, it may be helpful to run CGI::ENURL::enurl on input data before getting/posting. And make sure that the script is saved in utf-8 format, or CGI::ENURL won't do its job right.
YMMV...
| [reply] |