Hi, all.
First of all, I really made a lot of search around the web and here too, but I find nothing helpful for my case... So, I hope you can bring some light for me.
I have a web application (running on Apache) that consists of a HTML query form to search through text files. The form calls a Perl script that prepares some other details of the search, and calls a Java program (using back ticks) that does the search and returns the output in HTML code to the script, who does some final treatment and send it back to the browser.
The problem is that when the query informed by the user includes accented chars (its portuguese texts files), these are received by the Java program misconfigured, so the search returns nothing. Adapting the query, by removing the accents, doesn't solve the problem because the accented chars must be found. I created a version of the script, so it could be executed from a shell and it works fine with accents. So I thought it could be something with Apache+Perl but I have no idea of what/where... Just to mention, I tried all kind of conversions of IO charset etc. and it didn't work...
A simple version of the script:
#!/usr/bin/perl
print "Content-type: text/html;charset=utf-8", "\n\n";
$out1=(`java -classpath /usr/local/lib/CS.jar csearch/CorpusSearch 'HT
+MLQ((naġ Exists))' c_006_pos.txt.cs`);
print $output;
Piece of the output to the browser:
--------
search domain: $ROOT
query: (na�� Exists)
---------
As it shows, the original "naġ" was processed by the Java program as "na��". But if I ask the script to show de query right before submitting it to the Java program (before the "$out=(`..."), it shows it ok ("naġ"). All the other accented chars in the output are ok too, because I set the charset to utf8. As I said before, the same script works fine when I call it direct from the shell. The form is sending in utf-8 too. Is there anything between Apache and shell commands via Perl that I am missing?
Well, that's it. Any ideas?
Thanks,
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.