package ArChr; =head1 NAME ArChr -- useful character properties for Unicode Arabic =head1 SYNOPSIS use ArChr; $c = "..."; # some UTF8 string $c =~ /\p{ArChr::InARletter}/; # match only Arabic letters $c =~ /\p{ArChr::InARmark}/; # match only Arabic diacritics # see description for full set of terms =head1 DESCRIPTION This module supplements the Unicode character-class definitions with special groups relevant to Arabic linguistics. The following classes are defined: =over 4 =item InARletter Matches only the Arabic letter characters, leaving out all digits and diacritic and punctuation marks. =item InARmark Matches only the Arabic diacritic marks, leaving out all letters, digits and punctuation marks. =item InARvowel Matches vowel letters and diacritics, leaving out consonants, shadda, sukuun, and letters involving hamza. =item InARshortvowel Matches only the short-vowel diacritic marks, not sukuun or shadda. =item InARcons Matches consonant letters, hamzas and shadda, leaving out vowels and sukuun. =back =cut use strict; sub InARletter { return <<'END'; 0621 063A 0641 064A 0671 067E 0686 0698 06AF END } sub InARvowel { return <<'END'; 0627 064B 0650 END } sub InARcons { return <<'END'; +ArChr::InARletter -ArChr::InARvowel END } sub InARmark { return <<'END'; 064B 0652 0670 END } sub InARshortvowel { return <<'END'; 064B 0650 END } 1;