in reply to Re: Not this time.
in thread problem with german chars in html post fetch

This node falls below the community's threshold of quality. You may see it by logging in.
  • Comment on I don't think it's crypt::ssleay either.

Replies are listed 'Best First'.
Re: I don't think it's crypt::ssleay either.
by holli (Abbot) on Jan 07, 2005 at 22:33 UTC
    well, if you dump $www you can read
    LWP will support https URLs if the Crypt::SSLeay module is installed. More information at <http://www.linpro.no/lwp/libwww-perl/README.SSL>.
    so the first thing that comes to mind is, "Crypt::SSLeay" is not installed.

    fact 1: if i run your script it does not print anything.

    fact 2: after installing Crypt::SSleay (just done that) your script produces output. (a long html-document, i won´t post it because it is huge. I strongly assume you don´t have installed that module. (it needs additional libraries).

    Update: In Order to get the output you want, you have to url-encode, the query-string, by using the enurl()-funtion of CGI::Enurl;
    see this chart:

    Enrypt::SSLeay Installed : yes yes no "doesn´t work" query : börse enurl("börse") enurl("börse") "does work" query : börse börse börse ----------------------------------------------------------------- output identical : no yes no output
      Holli, thanks for your feedback. I *did* have crypt::ssleay installed on my system, although I didn't include use::ssleay in my script. This must be why why I was getting output and you weren't, with the identical script. At any rate, I edited my original script to use ssleay as you suggested.

      However, I am still not able to post my request successfully using "DWIM" perl, and am forced to use the function I rolled myself. I revised my script (below) to output the same two html files as before, plus an attempt encoding with cgi::enurl, another attempt with Escape::uri_escape, and an attempt with a regex suggested in Perlfaq9 ("") none of which worked, unfortunately.

      • doesntworkNothing.html (sends börse via post, outputs 25 kb html, no suggestions found)
      • doesntWorkEnurl.html (sends b%F6rse via post, 25 kb, no suggestions found)
      • doesItWorkUriEscape.html (sends b%F6rse via post, same as before, 25 kb, no suggestions found)
      • suggestionsPerlfaq9Regex.html (sends b%f6rse, same as before except lowercased, 25 kb, no suggestions found)
      • worksRolledMyOwn.html (sends börse via post, outputs 45 kb html, google suggestions are found, accounting for the extra 20 kb of html)

      According to the documentation, cgi::enurl should do what's needed here, but as the above script demonstrates, it fails where my hand rolled function succeeds.

      Any ideas?

      use strict; use HTTP::Request::Common; # HTTP handling use LWP::UserAgent; # HTTP handling use crypt::ssleay; use CGI::Enurl; use URI::Escape; use Encode; use Data::Dumper; my $query = 'börse'; my $www; # Returns a list of the canonical names of the available encodings tha +t are loaded. # http://cpan.uwinnipeg.ca/htdocs/Encode/Encode.html # on my system, this outputs: #$VAR1 = [ # 'ascii', # 'ascii-ctrl', # 'iso-8859-1', # 'null', # 'utf8' # ]; my @list = Encode->encodings(); open F, "> encodingsOutput.txt" or die "Cannot open encodings output." +; print F Dumper(\@list); close F; #doesn't work. #sends $query = 'börse'; $www = google_keyword_suggestions_html_debug('de', 'de', $query); open F, "> suggestionsOriginal.html" or die "Cannot open."; print F '$query: ' . "$query\n"; print F $www->content,"\n"; close F; #doesn't work either. #sends queryDoesntWorkEnurl: b%F6rse my $query_enurl = enurl($query); $www = google_keyword_suggestions_html_debug('de', 'de', $query_enurl) +; open F, "> suggestionsEnurl.html" or die "Cannot open."; print F '$query_enurl:' . "$query_enurl:\n"; print F $www->content,"\n"; close F; #encoding with uri_escape doesn't work either #sends b%F6rse (same as en_url) my $query_uri_escape = uri_escape($query); $www = google_keyword_suggestions_html_debug('de', 'de', $query_uri_es +cape); open F, "> suggestionsUriEscape.html" or die "Cannot open."; print F '$query_uri_escape: ' . "$query_uri_escape\n"; print F $www->content,"\n"; close F; #encodes with regex suggested in perlfaq9 #sends ?b%f6rse (same as en_url, except lower case.) my $query_regexPerlfaq9 = query_regexPerlfaq9($query); $www = google_keyword_suggestions_html_debug('de', 'de', $query_regexP +erlfaq9); open F, "> suggestionsPerlfaq9Regex.html" or die "Cannot open."; print F '$query_regexPerlfaq9: ' . "$query_regexPerlfaq9\n"; print F $www->content,"\n"; close F; s/([^\w()'*~!.-])/sprintf '%%%02x', ord $1/eg; # encode #works -- keyword suggestions are retrieved #sends $query = 'börse'; #keyword suggestions are retrieved, although the html is kind of warpe +d looking. my $query_works = germanchars_to_strange_html_chars($query); $www = google_keyword_suggestions_html_debug('de', 'de', $query_works) +; open F, "> suggestionsRolledMyOwn.html" or die "Cannot open."; print F '$query_works: ' . "$query_works\n"; print F $www->content,"\n"; close F; # returns $www object containing html for a successful code, or an err +or code sub google_keyword_suggestions_html_debug { my $language = shift; my $country = shift; my $query = shift; #this could be a list, but leaving it as a sing +le word. maybe change later. my $action = POST 'https://adwords.google.com/select/KeywordSandbox', [ 'save' => "save", 'wizard_name' => "keywordsandbox_wizard", 'language' => $language, 'country' => $country, 'keywords' => $query, ]; my $ua = LWP::UserAgent->new; $ua->agent('Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)'); $ua->timeout(30); my $www = $ua->request( $action ); return $www; } # Takes a variable and spits it back out with the proper german charac +ters sub germanchars_to_strange_html_chars { my $var = shift; my %table = ( 'ß' => 'ß', 'ä' => 'ä', 'ö' => 'ö', 'Ä' => 'ä', 'Ö' => 'ö', 'Ü' => 'ü', 'ü' => 'ü'); while (my ($k,$v) = each %table) { $var =~ s/$k/$v/g; } return $var; } #based on suggestion in perl faq9 sub query_regexPerlfaq9 { my $var = shift; $var =~ s/([^\w()'*~!.-])/sprintf '%%%02x', ord $1/eg; # encode return $var }
      Update: added test for posting with uri_escape (unfortunately doesn't work either) Update 2: added test for posting with uri encoding regex suggested in perlfaq9 (still doesn't work) Update 3: added function that dumps supperted on my system, into a text file. Currently this outputs
      $VAR1 = [ 'ascii', 'ascii-ctrl', 'iso-8859-1', 'null', 'utf8' ];
        As you can see in my chart above, sending a query using germanchars_to_strange_html_chars() and enurl() gives the same results. If they differ for you, i don´t know why.