in reply to String match in Chinese character

Ok, here is the cleaner self contain Perl script with inline FORM submit. Do make sure the form action value is "utf8_encode.pl" or change to your desire. For direct test, use this "【" Chinese character for example. For the result #3, I use unpack for this purpose. Previously found several ways and they give same results where single Chinese char "【" when split will become 3 char 227,128,144.

I'm still not quite understand of the explaination given. Almost getting the hand of it.

If I can get the encoding solved, then I think I should be able to get the Decode working as well.

The string match will be in separate processing where the code looks like this my ($result) = $str =~ m/\&\#12304\;(.*?)\&\#12305\;/sig;

#!/usr/bin/perl ###################################################################### +########## # # ###################################################################### +########## use CGI ':standard'; use HTML::Entities; #-- for encode and decode string (%FORM) = (); if ($ENV{'REQUEST_METHOD'} eq "POST") { my ($id); #-- extract the value inside param into %FORM hash foreach $id (param) { $FORM{$id} = param($id); } } # // if post print "Content-Type: text/html; charset=utf-8\n\n"; print "<h2>Encode UTF-8 Chinese Character Input</h2><br>"; print &input_form; #---------------------------------------------------# #---------------------------------------------------# sub input_form { my ($content) = ""; my ($value) = ""; if ($FORM{'data'} ne "") { $value = $FORM{'data'}; } my ($encoded_value) = ""; my ($process_content) = ""; if ($FORM{'action'} eq "encode") { $encoded_value = $FORM{'encoded_value'}; # !! attempt to do encoding inside perl but the $FORM{'dat +a'} when split, # it become 3 char for Chinese char !! my (@arr) = split(//,$FORM{'data'}); foreach my $c (@arr) { $c = unpack('C*', $c); $process_content .= "$c\n"; } } elsif ($FORM{'action'} eq "decode") { } #-- content --------------------------------- $content = qq~ <script type="text/javascript"> function encodeCN(id) { var tstr = document.getElementById(id).value; var bstr = ''; for(i=0; i<tstr.length; i++) { if(tstr.charCodeAt(i)>127) { bstr += '&#' + tstr.charCodeAt(i) + ';'; } else { bstr += tstr.charAt(i); } } document.getElementById('encoded_value').value = bstr; } </script> <form id="fr_in" name="fr_in" action="utf8_encode.pl" style="" met +hod="POST" enctype="application/x-www-form-urlencoded"> <input type="hidden" onFocus="this.blur()" name="convert" id="conv +ert" value=""> <input type="hidden" onFocus="this.blur()" name="action" id="actio +n" value=""> <input type="hidden" name="encoded_value" id="encoded_value" value +=""> <textarea id="data" name="data" style="width:600px; height:200px;" +>$value</textarea> <br> <xmp> 1. FORM submitted value: $value 2. Encoded value thru JS before form submit: $encoded_value 3. *Try to do encoding inside Perl* $process_content </xmp> <input type="button" value="Encode" onClick="encodeCN('data'); doc +ument.getElementById('action').value='encode'; this.form.submit();"> <input type="button" value="Decode" onClick="document.getElementBy +Id('action').value='decode'; this.form.submit();"> </form> ~; #--// content ------------------------------- return ($content); }