Ok, here is the cleaner self contain Perl script with inline FORM submit. Do make sure the form action value is "utf8_encode.pl" or change to your desire. For direct test, use this "【" Chinese character for example. For the result #3, I use unpack for this purpose. Previously found several ways and they give same results where single Chinese char "【" when split will become 3 char 227,128,144.

I'm still not quite understand of the explaination given. Almost getting the hand of it.

If I can get the encoding solved, then I think I should be able to get the Decode working as well.

The string match will be in separate processing where the code looks like this my ($result) = $str =~ m/\&\#12304\;(.*?)\&\#12305\;/sig;

#!/usr/bin/perl ###################################################################### +########## # # ###################################################################### +########## use CGI ':standard'; use HTML::Entities; #-- for encode and decode string (%FORM) = (); if ($ENV{'REQUEST_METHOD'} eq "POST") { my ($id); #-- extract the value inside param into %FORM hash foreach $id (param) { $FORM{$id} = param($id); } } # // if post print "Content-Type: text/html; charset=utf-8\n\n"; print "<h2>Encode UTF-8 Chinese Character Input</h2><br>"; print &input_form; #---------------------------------------------------# #---------------------------------------------------# sub input_form { my ($content) = ""; my ($value) = ""; if ($FORM{'data'} ne "") { $value = $FORM{'data'}; } my ($encoded_value) = ""; my ($process_content) = ""; if ($FORM{'action'} eq "encode") { $encoded_value = $FORM{'encoded_value'}; # !! attempt to do encoding inside perl but the $FORM{'dat +a'} when split, # it become 3 char for Chinese char !! my (@arr) = split(//,$FORM{'data'}); foreach my $c (@arr) { $c = unpack('C*', $c); $process_content .= "$c\n"; } } elsif ($FORM{'action'} eq "decode") { } #-- content --------------------------------- $content = qq~ <script type="text/javascript"> function encodeCN(id) { var tstr = document.getElementById(id).value; var bstr = ''; for(i=0; i<tstr.length; i++) { if(tstr.charCodeAt(i)>127) { bstr += '&#' + tstr.charCodeAt(i) + ';'; } else { bstr += tstr.charAt(i); } } document.getElementById('encoded_value').value = bstr; } </script> <form id="fr_in" name="fr_in" action="utf8_encode.pl" style="" met +hod="POST" enctype="application/x-www-form-urlencoded"> <input type="hidden" onFocus="this.blur()" name="convert" id="conv +ert" value=""> <input type="hidden" onFocus="this.blur()" name="action" id="actio +n" value=""> <input type="hidden" name="encoded_value" id="encoded_value" value +=""> <textarea id="data" name="data" style="width:600px; height:200px;" +>$value</textarea> <br> <xmp> 1. FORM submitted value: $value 2. Encoded value thru JS before form submit: $encoded_value 3. *Try to do encoding inside Perl* $process_content </xmp> <input type="button" value="Encode" onClick="encodeCN('data'); doc +ument.getElementById('action').value='encode'; this.form.submit();"> <input type="button" value="Decode" onClick="document.getElementBy +Id('action').value='decode'; this.form.submit();"> </form> ~; #--// content ------------------------------- return ($content); }

In reply to Re: String match in Chinese character by hankcoder
in thread String match in Chinese character by hankcoder

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.