I'm pretty sure this is an character encoding problem. The java program is probably expected utf8 (or utf16) encoded characters.
The solution is to use Encode; and is multi-fold:
- when reading user input, make sure you decode the strings correctly.
- when passing the strings to the java program, make sure you encode them to what the java program expects
- when gettings the results back from the java program, make sure they are decoded correctly
- finally, when you emit the results back to the user, make sure your characters are again encoded correctly.
Note that some of these step may be omitted if the incoming and outgoing character encodings are the same.
As a first step, try to figure out what encoding the java program is expecting as input and also what it is producing as output. For instance, try something like:
use Encode;
my $word = "naġ Exists";
my $encoded_word = Encode::encode('utf8', $word);
$out1=(`java -classpath /usr/local/lib/CS.jar csearch/CorpusSearch 'HT
+MLQ(($encoded_word))' c_006_pos.txt.cs`);
print $out1;
and see what $out1 looks like. You can use Firefox to do this --just use View -> Character Encoding -> More Encodings -> Unicode to try some different encodings out. If utf8 doesn't work, try 'utf16' which is another popular encoding to use with java.
After you've figured out the java part, then you should decide on an output encoding (either latin1 or utf8), add a charset parameter to your Content-type header, and use Encode::encode to encode the output, e.g.:
print "Content-type: text/html; charset=utf-8
... set $out1 from java program ...
print Encode::encode('utf8', $out1);
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.