Hi, all.
First of all, I really made a lot of search around the web and here too, but I find nothing helpful for my case... So, I hope you can bring some light for me.
I have a web application (running on Apache) that consists of a HTML query form to search through text files. The form calls a Perl script that prepares some other details of the search, and calls a Java program (using back ticks) that does the search and returns the output in HTML code to the script, who does some final treatment and send it back to the browser.
The problem is that when the query informed by the user includes accented chars (its portuguese texts files), these are received by the Java program misconfigured, so the search returns nothing. Adapting the query, by removing the accents, doesn't solve the problem because the accented chars must be found. I created a version of the script, so it could be executed from a shell and it works fine with accents. So I thought it could be something with Apache+Perl but I have no idea of what/where... Just to mention, I tried all kind of conversions of IO charset etc. and it didn't work...
A simple version of the script:
#!/usr/bin/perl
print "Content-type: text/html;charset=utf-8", "\n\n";
$out1=(`java -classpath /usr/local/lib/CS.jar csearch/CorpusSearch 'HT
+MLQ((naġ Exists))' c_006_pos.txt.cs`);
print $output;
Piece of the output to the browser:
--------
search domain: $ROOT
query: (na�� Exists)
---------
As it shows, the original "naġ" was processed by the Java program as "na��". But if I ask the script to show de query right before submitting it to the Java program (before the "$out=(`..."), it shows it ok ("naġ"). All the other accented chars in the output are ok too, because I set the charset to utf8. As I said before, the same script works fine when I call it direct from the shell. The form is sending in utf-8 too. Is there anything between Apache and shell commands via Perl that I am missing?
Well, that's it. Any ideas?
Thanks,