in reply to Re^4: Namespace/advice for new CPAN modules for Thai & Lao ( Regexp::CharProps::Thai )
in thread Namespace/advice for new CPAN modules for Thai & Lao
So maybe
#!/usr/bin/perl -- =head1 NAME Regexp::CharProps - User Defined Character Properties =head1 SYNOPSIS use Regexp::CharProps qw/ Thai /; ## like use Regexp::CharProps::T +hai qw/ :all /; ## imports all exports like sub +InThaiPunct... print "\$_ has got Thai" if m{ \p{InThai} |\p{InThaiCons} |\p{InThaiHCons} |\p{InThaiMCons} |\p{InThaiLCons} |\p{InThaiVowel} |\p{InThaiPreVowel} |\p{InThaiPostVowel} |\p{InThaiCompVowel} |\p{InThaiDigit} |\p{InThaiTone} |\p{InThaiPunct} }x; use Regexp::CharProps::Thai qw/ InThaiPunct /; ## not import :all +just sub InThaiPunct print "got Thai punctuation\n" if m/\p{InThaiPunct}/; =cut package Regexp::CharProps; sub import { my( $class, @modsforall ) = @_; return if not @modsforall; my $target = scalar caller; require Import::Into; for my $mod( @modsforall ){ my $package = $class."::".$mod; $package ->import::into( $target , ':all' ); } } 1;
with accompanying EXPORT_TAGS
package Regexp::CharProps::Thai; use 5.008003; use strict; use warnings; require Exporter; our $VERSION = '1.01'; our @ISA = qw(Exporter); our @EXPORT_OK = qw( InThai InThaiCons InThaiHCons InThaiMCons InThaiLCons InThaiVowel InThaiPreVowel InThaiPostVowel InThaiCompVowel InThaiDigit InThaiTon +e InThaiPunct ); our %EXPORT_TAGS = ( 'all' => [ @EXPORT_OK ] ); =head1 NAME Thai - useful character properties for Unicode Thai =head1 SYNOPSIS use Regexp::CharProps::Thai; $char = "..."; # some UTF8 string $char =~ /\p{InThaiCons}/; # match a Thai consonant $char =~ /\p{InThaiTone}/; # match a Thai tone mark # see description for full set of terms =head1 DESCRIPTION This module supplements the Unicode character-class definitions with special groups relevant to Thai linguistics. The following classes are defined: =over 4 =item InThai Matches ALL characters in the Thai unicode code-point range. =item InThaiCons Matches Thai consonant letters, leaving out vowels, numerics, tone mar +ks, etc. =item InThaiVowel Matches Thai vowels, including compounded and free-standing vowels. NOTE: Exceptions here include several of the "consonants" which also s +erve as vowels: or-ang, yo-yak, double ro-reua, leut and reut, and wo-wen. + These are included as vowels in this grouping to accept the widest pos +sible definition, but cannot with certainty be determined by this to be in u +se as actual vowels in the instance of their identification here. =item InThaiAlpha Matches only the Thai alphabetic characters (consonants and vowels), excluding all digits, tone marks, and punctuation marks. =item InThaiTone Matches only the Thai tone marks, leaving out all letters, digits and punctuation marks. =item InThaiPunct Matches Thai punctuation characters, not including tone marks, white space, digits or alphabetic characters, and not including non-Thai punctuation marks (such as English [.,'"!?] etc.). =item InThaiCompVowel Matches only the Thai vowels which are compounded with a Thai consonan +t, and matching only the vowel portion of the compounded character. =item InThaiPreVowel Matches only the subset of vowels which appear _before_ the consonant with which they are associated (though in Thai they are sounded _after +_ said consonant); this excludes all consonant-vowels and does not inclu +de any of the compounded vowels. =item InThaiPostVowel Matches only the vowels which appear _after_ the consonant with which they are associated; this excludes all consonant-vowels and does not include any of the compounded vowels. =item InThaiHCons Matches high-class Thai consonants. =item InThaiMCons Matches middle-class Thai consonants. =item InThaiLCons Matches low-class Thai consonants. =item InThaiDigit Matches Thai numerical digits only. =back =cut sub InThai { return <<'END'; 0E01 0E5B END } sub InThaiCons { return <<'END'; 0E01 0E2E END } sub InThaiVowel { return join "\n", '0E30 0E45', '0E47',#Thai semi-tone mark used above gor-gai in Thai "gor" (or) '0E4D', '0E22',#Thai consonant yo-yak can also be a vowel (like 'y' in English +) '0E2D',#Thai consonant or-ang can also be a vowel '0E27',#Thai consonant wo-wen is only a vowel following mai han-akat } #+Thai::InThaiCons #+Thai::InThaiVowel sub InThaiAlpha { return <<'END'; 0E01 0E2E 0E30 0E45 0E47 0E4D 0E22 0E2D 0E27 END } sub InThaiTone { return <<'END'; 0E48 0E4B END } sub InThaiPunct { return <<'END'; 0E46 0E4C 0E4E 0E4F 0E5A 0E5B END } sub InThaiCompVowel { return join "\n", '0E31',#Thai mai han-akat '0E34',#Thai sara-i '0E35',#Thai sara-ii '0E36',#Thai sara-ue '0E37',#Thai sara-uee '0E38',#Thai sara-u '0E39',#Thai sara-uu '0E3A',#Thai phinthu '0E47',#Thai semi-tone mark used above gor-gai in Thai "gor" (or) } sub InThaiPreVowel { return <<'END'; 0E40 0E44 END } sub InThaiPostVowel { return <<'END'; 0E45 0E30 0E32 0E33 END } sub InThaiHCons { return <<'END'; 0E02 0E03 0E09 0E10 0E16 0E1C 0E1D 0E28 0E29 0E2A 0E2B END } sub InThaiMCons { return <<'END'; 0E01 0E08 0E0E 0E0F 0E14 0E15 0E1A 0E1B 0E2D END } #+Thai::InThaiCons #-Thai::InThaiHcons #-Thai::InThaiMCons sub InThaiLCons { return <<'END'; 0E04 0E07 0E0A 0E0D 0E11 0E13 0E17 0E19 0E1E 0E27 0E2C 0E2E END } sub InThaiDigit { return <<'END'; 0E50 0E59 END } =head1 AUTHOR Erik Mundall =head1 COPYRIGHT Copyright (C) 2015 Erik Mundall. All Rights Reserved. This is free software; you can redistribute it and/or modify it under the same terms as Perl itself. =cut 1;
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^6: Namespace/advice for new CPAN modules for Thai & Lao ( Regexp::CharProps - User Defined Character Properties )
by Polyglot (Chaplain) on Mar 24, 2015 at 05:00 UTC | |
by Polyglot (Chaplain) on Mar 24, 2015 at 09:32 UTC | |
by Polyglot (Chaplain) on Mar 24, 2015 at 13:22 UTC | |
|