Holli, thanks for your feedback. I *did* have crypt::ssleay installed on my system, although I didn't include use::ssleay in my script. This must be why why I was getting output and you weren't, with the identical script. At any rate, I edited my original script to use ssleay as you suggested.
However, I am still not able to post my request successfully using "DWIM" perl, and am forced to use the function I rolled myself. I revised my script (below) to output the same two html files as before, plus an attempt encoding with cgi::enurl, another attempt with Escape::uri_escape, and an attempt with a regex suggested in Perlfaq9 ("") none of which worked, unfortunately.
- doesntworkNothing.html (sends börse via post, outputs 25 kb html, no suggestions found)
- doesntWorkEnurl.html (sends b%F6rse via post, 25 kb, no suggestions found)
- doesItWorkUriEscape.html (sends b%F6rse via post, same as before, 25 kb, no suggestions found)
- suggestionsPerlfaq9Regex.html (sends b%f6rse, same as before except lowercased, 25 kb, no suggestions found)
- worksRolledMyOwn.html (sends börse via post, outputs 45 kb html, google suggestions are found, accounting for the extra 20 kb of html)
According to the documentation, cgi::enurl should do what's needed here, but as the above script demonstrates, it fails where my hand rolled function succeeds.
Any ideas?
use strict;
use HTTP::Request::Common; # HTTP handling
use LWP::UserAgent; # HTTP handling
use crypt::ssleay;
use CGI::Enurl;
use URI::Escape;
use Encode;
use Data::Dumper;
my $query = 'börse';
my $www;
# Returns a list of the canonical names of the available encodings tha
+t are loaded.
# http://cpan.uwinnipeg.ca/htdocs/Encode/Encode.html
# on my system, this outputs:
#$VAR1 = [
# 'ascii',
# 'ascii-ctrl',
# 'iso-8859-1',
# 'null',
# 'utf8'
# ];
my @list = Encode->encodings();
open F, "> encodingsOutput.txt" or die "Cannot open encodings output."
+;
print F Dumper(\@list);
close F;
#doesn't work.
#sends $query = 'börse';
$www = google_keyword_suggestions_html_debug('de', 'de', $query);
open F, "> suggestionsOriginal.html" or die "Cannot open.";
print F '$query: ' . "$query\n";
print F $www->content,"\n";
close F;
#doesn't work either.
#sends queryDoesntWorkEnurl: b%F6rse
my $query_enurl = enurl($query);
$www = google_keyword_suggestions_html_debug('de', 'de', $query_enurl)
+;
open F, "> suggestionsEnurl.html" or die "Cannot open.";
print F '$query_enurl:' . "$query_enurl:\n";
print F $www->content,"\n";
close F;
#encoding with uri_escape doesn't work either
#sends b%F6rse (same as en_url)
my $query_uri_escape = uri_escape($query);
$www = google_keyword_suggestions_html_debug('de', 'de', $query_uri_es
+cape);
open F, "> suggestionsUriEscape.html" or die "Cannot open.";
print F '$query_uri_escape: ' . "$query_uri_escape\n";
print F $www->content,"\n";
close F;
#encodes with regex suggested in perlfaq9
#sends ?b%f6rse (same as en_url, except lower case.)
my $query_regexPerlfaq9 = query_regexPerlfaq9($query);
$www = google_keyword_suggestions_html_debug('de', 'de', $query_regexP
+erlfaq9);
open F, "> suggestionsPerlfaq9Regex.html" or die "Cannot open.";
print F '$query_regexPerlfaq9: ' . "$query_regexPerlfaq9\n";
print F $www->content,"\n";
close F;
s/([^\w()'*~!.-])/sprintf '%%%02x', ord $1/eg; # encode
#works -- keyword suggestions are retrieved
#sends $query = 'börse';
#keyword suggestions are retrieved, although the html is kind of warpe
+d looking.
my $query_works = germanchars_to_strange_html_chars($query);
$www = google_keyword_suggestions_html_debug('de', 'de', $query_works)
+;
open F, "> suggestionsRolledMyOwn.html" or die "Cannot open.";
print F '$query_works: ' . "$query_works\n";
print F $www->content,"\n";
close F;
# returns $www object containing html for a successful code, or an err
+or code
sub google_keyword_suggestions_html_debug {
my $language = shift;
my $country = shift;
my $query = shift; #this could be a list, but leaving it as a sing
+le word. maybe change later.
my $action = POST
'https://adwords.google.com/select/KeywordSandbox',
[
'save' => "save",
'wizard_name' => "keywordsandbox_wizard",
'language' => $language,
'country' => $country,
'keywords' => $query,
];
my $ua = LWP::UserAgent->new;
$ua->agent('Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)');
$ua->timeout(30);
my $www = $ua->request( $action );
return $www;
}
# Takes a variable and spits it back out with the proper german charac
+ters
sub germanchars_to_strange_html_chars {
my $var = shift;
my %table = ( 'ß' => 'ß', 'ä' => 'ä', 'ö' => 'ö',
'Ä' => 'ä', 'Ö' => 'ö', 'Ü' => 'ü',
'ü' => 'ü');
while (my ($k,$v) = each %table) {
$var =~ s/$k/$v/g;
}
return $var;
}
#based on suggestion in perl faq9
sub query_regexPerlfaq9 {
my $var = shift;
$var =~ s/([^\w()'*~!.-])/sprintf '%%%02x', ord $1/eg; # encode
return $var
}
Update: added test for posting with uri_escape (unfortunately doesn't work either)
Update 2: added test for posting with uri encoding regex suggested in perlfaq9 (still doesn't work)
Update 3: added function that dumps supperted on my system, into a text file. Currently this outputs
$VAR1 = [
'ascii',
'ascii-ctrl',
'iso-8859-1',
'null',
'utf8'
];
| [reply] [d/l] [select] |
As you can see in my chart above, sending a query using germanchars_to_strange_html_chars() and enurl() gives the same results. If they differ for you, i don´t know why.
| [reply] |
Okay, I think I've figured this out. The upshot is that the reason Holli and I are getting different results is probably because his script file is encoded in Utf8, whereas mine is encoded with Ansi windows.
When I converted my script to utf8 with editpad before running it, it worked. (Originally Holli had suggested that I convert to "dos mode", which I interpreted as running convert Ansi->OEM in editpad (since that's what the editpad help file calls dosmode). However, if I had run convert ANSI->utf8, I would have had success and saved myself many hours of head scratching. OTOH, at least I'm beginning to get a better understanding for troubleshooting encoding issues, and I hope by sharing my experience I may help others.
During the headscratching phase, I painstakingly put together the following chart comparing utf8 and windows ansi.
| symbol |
encoding |
editpad hex mode display |
editpad normal mode displays |
| ö |
ansi windows |
f6 |
ö |
| ö |
utf8 |
c3b6 |
ö |
| ö |
dos mode (oem) |
94 |
” |
Editpad users (limited time demo version available for download) may appreciate the following info. Windows Ansi is editpad's default mode. utf8 characters were derived by running editpad->convert->unicode->ansi to utf8. dos mode characters, I ran convert->ANSI to OEM. Hex mode results for all of the above were derived in editpad by switching to hexmode with ctrl-h.
I conclude that CGI::enurl does not work at spitting out appropriate post characters when fed german characters encoded with the windows default. Or put more simply, cgi::enurl is windows unfriendly. I wonder if there is a way to contribute to cgi::enurl and URI::Escape (which works the same way), to make them more windows friendly. But I will leave this to another day.
thomas.
| [reply] |
Thanks again holli. The fact seem to be that
enurl('börse') gives
b%F6rse
on my system, whereas it gives
börse
on your system.
I am guessing we have different perl versions, different module versions, or different default encodings.
perl -MCGI::Enurl -e"print $CGI::Enurl::VERSION" > enUrlVersion.txt
outputs 1.07 for my enurl version.
<code>
perl -v > perlVoutput.txt
outputs
This is perl, v5.8.4 built for MSWin32-x86-multi-thread
(with 3 registered patches, see perl -V for more detail)
Copyright 1987-2004, Larry Wall
Binary build 810 provided by ActiveState Corp. http://www.ActiveState.com
ActiveState is a division of Sophos.
Built Jun 1 2004 11:52:21
I am not sure how to get the default encoding. Truly stumped...
thomas.
*******
UPDATE: Relevant (though so far not helpful) documentation seems to be at:
| [reply] [d/l] [select] |