in reply to To find Cyrillic characters - unicode
Script should validate that cd element contains only cyrillic characters. If it contains other character set, it should prompt an error.
Um... is it okay for text within <cd>...</cd> to include spaces, digits, punctuation, etc? These lie outside the Unicode Cyrillic range, but might not be "errors".
If the tag really is supposed to contain only Cyrillic letters (no whitespace, digits, etc), then something like warn "Bad content: $cdstr\n" if ($cdstr =~ /\P{InCyrillic}/); really is all you need, as Zaxo suggested.
Adding more to the "acceptable characters" list is not too complicated (although I did have some trouble with methods that I expected to work based on the perlunicode man page). This seems to work okay for the case where whitespace, digits and punctuation are acceptable along with Cyrillic:
(I had expected that I could put a bunch of "\p{...}" things inside a single [...] character class, but that didn't work as expected in 5.8.6 or 5.8.8; I even had trouble defining my own subroutine, along the lines explained in perlunicode, and demonstrated here by japhy -- my subroutine ran, but the results were not as expected. I'll be posting a question/bug report to the perl-unicode mailing list.)warn "Bad content: $cdstr\n" unless ( $cdstr =~ /^(?:[\s\d\p{Punctuation}]|\p{Cyrillic})+$/ );
|
|---|