I'm pretty sure this is an character encoding problem. The java program is probably expected utf8 (or utf16) encoded characters.

The solution is to use Encode; and is multi-fold:

Note that some of these step may be omitted if the incoming and outgoing character encodings are the same.

As a first step, try to figure out what encoding the java program is expecting as input and also what it is producing as output. For instance, try something like:

use Encode; my $word = "naġ Exists"; my $encoded_word = Encode::encode('utf8', $word); $out1=(`java -classpath /usr/local/lib/CS.jar csearch/CorpusSearch 'HT +MLQ(($encoded_word))' c_006_pos.txt.cs`); print $out1;

and see what $out1 looks like. You can use Firefox to do this --just use View -> Character Encoding -> More Encodings -> Unicode to try some different encodings out. If utf8 doesn't work, try 'utf16' which is another popular encoding to use with java.

After you've figured out the java part, then you should decide on an output encoding (either latin1 or utf8), add a charset parameter to your Content-type header, and use Encode::encode to encode the output, e.g.:

print "Content-type: text/html; charset=utf-8 ... set $out1 from java program ... print Encode::encode('utf8', $out1);

In reply to Re: Apache+PerlCGI: accent problems by pc88mxer
in thread Apache+PerlCGI: accent problems by pablofaria

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.