Script should validate that cd element contains only cyrillic characters. If it contains other character set, it should prompt an error.

Um... is it okay for text within <cd>...</cd> to include spaces, digits, punctuation, etc? These lie outside the Unicode Cyrillic range, but might not be "errors".

If the tag really is supposed to contain only Cyrillic letters (no whitespace, digits, etc), then something like  warn "Bad content: $cdstr\n" if ($cdstr =~ /\P{InCyrillic}/); really is all you need, as Zaxo suggested.

Adding more to the "acceptable characters" list is not too complicated (although I did have some trouble with methods that I expected to work based on the perlunicode man page). This seems to work okay for the case where whitespace, digits and punctuation are acceptable along with Cyrillic:

warn "Bad content: $cdstr\n" unless ( $cdstr =~ /^(?:[\s\d\p{Punctuation}]|\p{Cyrillic})+$/ );
(I had expected that I could put a bunch of "\p{...}" things inside a single [...] character class, but that didn't work as expected in 5.8.6 or 5.8.8; I even had trouble defining my own subroutine, along the lines explained in perlunicode, and demonstrated here by japhy -- my subroutine ran, but the results were not as expected. I'll be posting a question/bug report to the perl-unicode mailing list.)

In reply to Re: To find Cyrillic characters - unicode by graff
in thread To find Cyrillic characters - unicode by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.