senthilkumarperl has asked for the wisdom of the Perl Monks concerning the following question:

hi,

I have mixing of Chinese and English word . I want pattern Matching for only for Chinese.. Below I mentioned list of word

太多地

I won't interfere too much with her life.

Let's see the next clip.

但是

对于

女孩子

工作的

Since there will be times when I won't be with her,

do You have Any idea?

  • Comment on Regular expression for chinese character

Replies are listed 'Best First'.
Re: Regular expression for chinese character
by Corion (Patriarch) on May 19, 2011 at 08:27 UTC

    What code have you already written? It's hard to give you helpful advice if you don't show us what you have accomplished already. Please help us to help you better by showing us the relevant code, the input you give and the output you get. Please also explain what output you expect instead.

    I recommend decoding all your input to UTF-8 and then using the UTF-8 properties to extract the Chinese glyphs. Do note that the "Chinese" glyphs overlap with the Japanese glyphs etc., but at least some pages point to "Unihan" as the list of glyphs that is likely to be of use. Also see Unihan.

Re: Regular expression for chinese character
by ikegami (Patriarch) on May 19, 2011 at 10:09 UTC
    I think \p{Han} (short for \p{Script=Han}) will do.
Re: Regular expression for chinese character
by John M. Dlugosz (Monsignor) on May 19, 2011 at 13:59 UTC
    First, learn about the "regexp" feature. Perhaps start with perlretut. For example, /\d+/ will match a sequence of digits (0 through 9). Similarly, you can find a sequence of characters that are used in Asian languages, as opposed to ASCII or other Latin, Greek, etc. characters.

    There are built-in classifications, including "Han", which another poster illustrated.

    So, use a pattern that finds all occurrences of Han characters within your mixed text.

Re: Regular expression for chinese character
by ssqq (Initiate) on May 19, 2011 at 15:13 UTC
    use 5.010; use strict; use warnings; use Inline::Files; while (my $line = <FILE>) { chomp $line; say $line if ($line !~ /^\w/); } __FILE__ hello world Ì«Ñô ÔÂÁÁI I am an boy
Re: Regular expression for chinese character
by senthilkumarperl (Novice) on May 19, 2011 at 09:28 UTC

    It is OK,I fixed my issues

      Please briefly tell us how you fixed your problem.

      Make this thread “a complete thought,” so that the next person, who has the same problem, can not only see “that you fixed the problem (good for you...),” but can also see “how they, too can fix the same problem.”

      It is okay if your English is not-so-good.   (My Chinese is non-existent!)   Thanks.

Re: Regular expression for chinese character
by senthilkumarperl (Novice) on May 19, 2011 at 10:28 UTC

    Hi,

    If you give any example it would be better for me