tphyahoo has asked for the wisdom of the Perl Monks concerning the following question:

This is my second german character question of the day, first was here.

Anyway, I isolated another weird behavior when I do an html post request. Specifically, I'm trying to retrieve google keyword suggestions for words with german characters in them. (People that search for x are also likely to search for y.)

The problem is, unless I run a special utility function I rolled myself

germanchars_to_strange_html_chars($query)
before fetching the html, google returns no keyword suggestions. That is, the html fetch succeeds, but no keywords are fetched, as though I had searched for suggestions on "bqwersjoseifjsslei asflse8asd" which of course retrieves nothing.

Here's the program, with the problem isolated as best I could. The html grabs are saved as "works.html" and "doesntwork.html" so you can see the results for yourself.

Oh -- winxp and active state perl 5.8, though I don't think this could be a platform problem... or could it?

use strict; use HTTP::Request::Common; # HTTP handling use LWP::UserAgent; # HTTP handling use crypt::ssleay; my $query = 'brse'; my $www; #doesn't work. #sends $query = 'brse'; $www = google_keyword_suggestions_html_debug('de', 'de', $query); open F, "> doesntwork.html" or die "Cannot open."; print F $www->content,"\n"; close F; #works -- keyword suggestions are retrieved #sends $query = 'börse'; #keyword suggestions are retrieved, although the html is kind of warpe +d looking. $query = germanchars_to_strange_html_chars($query); $www = google_keyword_suggestions_html_debug('de', 'de', $query); open F, "> works.html" or die "Cannot open."; print F $www->content,"\n"; close F; # returns $www object containing html for a successful code, or an err +or code sub google_keyword_suggestions_html_debug { my $language = shift; my $country = shift; my $query = shift; #this could be a list, but leaving it as a sing +le word. maybe change later. my $action = POST 'https://adwords.google.com/select/KeywordSandbox', [ 'save' => "save", 'wizard_name' => "keywordsandbox_wizard", 'language' => $language, 'country' => $country, 'keywords' => $query, ]; my $ua = LWP::UserAgent->new; $ua->agent('Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)'); $ua->timeout(30); my $www = $ua->request( $action ); return $www; } # Takes a variable and spits it back out with the proper german charac +ters sub germanchars_to_strange_html_chars { my $var = shift; my %table = ( '' => 'ß', '' => 'ä', '' => 'ö', '' => 'ä', '' => 'ö', '' => 'ü', '' => 'ü'); while (my ($k,$v) = each %table) { $var =~ s/$k/$v/g; } return $var; }

I'm wondering if there is some setting or "mode", or locale that I could be in, that would enable my code to run without the hand-rolled post filter. If that's not the case, it seems to me that all post requests with german characters in them are at risk in LWP. In which case I should submit my findings to cpan or something or something, no?

Wise monks, I hope once again you have light to shed into my gloomy cubicle!

thomas. UPDATE: edited above code to use::ssleay, per holli's recommendations below.

Replies are listed 'Best First'.
Re: problem with german chars in html post fetch
by holli (Abbot) on Jan 07, 2005 at 21:36 UTC
    the answer is basically the same as in your other question. save the script file in "dos-mode".
      Holli, that worked for the dos batch file thing you advised me on, but this seems to be another problem.

      I did the convert ansi->oem for my script file before running it, and ran it again, but this time neither of the save attempts retrieved any suggestions from the google tool.

      You can see for yourself, assuming the perlmonks code grepper works for these weird characters.

      # adwordsDebugGermanOem.pl use strict; use HTTP::Request::Common; # HTTP handling use LWP::UserAgent; # HTTP handling my $query = 'brse'; my $www; #doesn't work. #sends $query = 'brse'; $www = google_keyword_suggestions_html_debug('de', 'de', $query); open F, "> doesntwork.html" or die "Cannot open."; print F $www->content,"\n"; close F; #works -- keyword suggestions are retrieved #sends $query = 'brse'; #keyword suggestions are retrieved, although the html is kind of warpe +d looking. $query = germanchars_to_strange_html_chars($query); $www = google_keyword_suggestions_html_debug('de', 'de', $query); open F, "> works.html" or die "Cannot open."; print F $www->content,"\n"; close F; # returns $www object containing html for a successful code, or an err +or code sub google_keyword_suggestions_html_debug { my $language = shift; my $country = shift; my $query = shift; #this could be a list, but leaving it as a sing +le word. maybe change later. my $action = POST 'https://adwords.google.com/select/KeywordSandbox', [ 'save' => "save", 'wizard_name' => "keywordsandbox_wizard", 'language' => $language, 'country' => $country, 'keywords' => $query, ]; my $ua = LWP::UserAgent->new; $ua->agent('Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)'); $ua->timeout(30); my $www = $ua->request( $action ); return $www; } # Takes a variable and spits it back out with the proper german charac +ters sub germanchars_to_strange_html_chars { my $var = shift; my %table = ( '' => 'Y', '' => '', '' => '', '' => '', '' => '', '' => 'Ǭ', '' => 'Ǭ'); while (my ($k,$v) = each %table) { $var =~ s/$k/$v/g; } return $var; }
      Thanks anyway though!

      Any other ideas?

        ah, well, i see.

        you will have to install the module Crypt::SSLeay. try

        #top of script use Data::Dumper; .... my $www = $ua->request( $action ); print Dumper ($www);
        and you will be enlightend.

        But that has nothing to do with char encoding.

        sorry for not reading your post properly, though.

        A reply falls below the community's threshold of quality. You may see it by logging in.
Re: problem with german chars in html post fetch
by tphyahoo (Vicar) on Jan 26, 2005 at 09:20 UTC
Re: problem with german chars in html post fetch
by tphyahoo (Vicar) on Mar 07, 2005 at 09:43 UTC
Tried with CGI::enurl, Escape::uri_escape, and a regex from perlfaq9, but still relying on the function I kludged together...
by tphyahoo (Vicar) on Jan 10, 2005 at 12:15 UTC
    I revised my script to output the same two html files as before, plus an attempt encoding with cgi::enurl, another attempt with Escape::uri_escape, and an attempt with a regex suggested in Perlfaq9 ("") none of which worked, unfortunately.

    The script seems to work for holli, but not for me.