Okay, I think I've figured this out. The upshot is that the reason Holli and I are getting different results is probably because his script file is encoded in Utf8, whereas mine is encoded with Ansi windows.
When I converted my script to utf8 with editpad before running it, it worked. (Originally Holli had suggested that I convert to "dos mode", which I interpreted as running convert Ansi->OEM in editpad (since that's what the editpad help file calls dosmode). However, if I had run convert ANSI->utf8, I would have had success and saved myself many hours of head scratching. OTOH, at least I'm beginning to get a better understanding for troubleshooting encoding issues, and I hope by sharing my experience I may help others.
During the headscratching phase, I painstakingly put together the following chart comparing utf8 and windows ansi.
| symbol |
encoding |
editpad hex mode display |
editpad normal mode displays |
| ö |
ansi windows |
f6 |
ö |
| ö |
utf8 |
c3b6 |
ö |
| ö |
dos mode (oem) |
94 |
” |
Editpad users (limited time demo version available for download) may appreciate the following info. Windows Ansi is editpad's default mode. utf8 characters were derived by running editpad->convert->unicode->ansi to utf8. dos mode characters, I ran convert->ANSI to OEM. Hex mode results for all of the above were derived in editpad by switching to hexmode with ctrl-h.
I conclude that CGI::enurl does not work at spitting out appropriate post characters when fed german characters encoded with the windows default. Or put more simply, cgi::enurl is windows unfriendly. I wonder if there is a way to contribute to cgi::enurl and URI::Escape (which works the same way), to make them more windows friendly. But I will leave this to another day.
thomas. |