in reply to Differences in UTF-8 html form

I'm currently troubleshooting a problem with html posts requests involving german html characters, at problem with german chars in html post fetch. This is naked lwp not mason, and german not hungarian, but I still thought the problem space was similar enough that it might shed light on your situation.

Basically, before sending my post request I try encoding it several ways -- cgi::enurl, escape::uri_escape, and even a regex suggested in perlfaq9 -- but none of these work, and I wind up having to create my own function to get my post request to work correctly.

I'm not really happy with my solution as it is very un-DWIM, but at least it's working.

Hope this helps!

  • Comment on I'm troubleshooting post requests with german chars...

Replies are listed 'Best First'.
Re: I'm troubleshooting post requests with german chars...
by Realbot (Scribe) on Jan 10, 2005 at 12:50 UTC
    I'd love to see your function, if you can share it...
    Here or by mail if you prefer (realbot |at| gmail |dot| com)

    Thanks!
      Sorry, forgot to link to problem with german chars in html post fetch but I just went back and fixed (above).

      I actually did two code postings in the thread because my approach to the problem evolved, and I judged it would make the original question harder to understand if I just updated... so to get the most knowledge about the weirdness I was encountering, I'd read through the whole thread.

      But if you just want the function snipped, it was

      # Takes a variable and spits it back out with the proper german charac +ters sub germanchars_to_strange_html_chars { my $var = shift; my %table = ( 'ß' => 'ß', 'ä' => 'ä', 'ö' => 'ö', 'Ä' => 'ä', 'Ö' => 'ö', 'Ü' => 'ü', 'ü' => 'ü'); while (my ($k,$v) = each %table) { $var =~ s/$k/$v/g; } return $var; }
      I don't think this is a particular solution to the problems you were having in hungarian, just that my approach might be helpful.

      Also, you might want to know, how did I determine the substitutions for the function?

      I did the post request manually with firefox, and then did file->save as. The funky characters were then culled from the result. Truly a kludge, and I'm sure there's a better way to do it, but until I figure it out, that's what I'm left with.

      thomas.

      To further distill what seems to be the problem, the crux is that CGI::enurl('börse') results in 'b%F6rse' thing for me, but 'börse' for holli, another perlmonk working with german characters, who tried to help me.

      The result of this is that cgi post requests work for holli after cgi::enurl-ing, but they fail for me. Very un-dwimmy.

      I suspect this has something to do with differences between the default encoding on my system, holli's system, and maybe realbot's system. And I also suspect this has to do with how the perl encode works. But I am at a loss for how to isolate this.

      Hope this leads to a solution for both of us somehow...

      thomas.

      Realbot, after hours of headscratching I finally found a solution to my problem, which I recorded at The problem was utf-8 versus windows ansi.

      Glad you found a solution. For those running into this kind of issue with automated crawling via LWP calls, it may be helpful to run CGI::ENURL::enurl on input data before getting/posting. And make sure that the script is saved in utf-8 format, or CGI::ENURL won't do its job right.

      YMMV...