Re^5: Namespace/advice for new CPAN modules for Thai & Lao ( Regexp::CharProps

So maybe

#!/usr/bin/perl --

=head1 NAME

Regexp::CharProps - User Defined Character Properties 

=head1 SYNOPSIS

    use Regexp::CharProps qw/ Thai /; ## like use Regexp::CharProps::T
+hai qw/ :all /;
                                      ## imports all exports like sub 
+InThaiPunct...
    
    print "\$_ has got Thai" if m{
        \p{InThai}
        |\p{InThaiCons}
        |\p{InThaiHCons}
        |\p{InThaiMCons}
        |\p{InThaiLCons}
        |\p{InThaiVowel}
        |\p{InThaiPreVowel}
        |\p{InThaiPostVowel}
        |\p{InThaiCompVowel}
        |\p{InThaiDigit}
        |\p{InThaiTone}
        |\p{InThaiPunct}
    }x;
    
    use Regexp::CharProps::Thai qw/ InThaiPunct /; ## not import :all 
+just sub InThaiPunct
    print "got Thai punctuation\n" if m/\p{InThaiPunct}/;

=cut


package Regexp::CharProps;
sub import {
    my( $class, @modsforall ) = @_;
    
    return if not @modsforall;
    
    my $target  = scalar caller;
    require Import::Into;
    for my $mod( @modsforall ){
        my $package = $class."::".$mod;
        $package ->import::into( $target , ':all' );
    }
}
1;
[download]

with accompanying EXPORT_TAGS

package Regexp::CharProps::Thai;

use 5.008003;
use strict;
use warnings;

require Exporter;

our $VERSION = '1.01';

our @ISA = qw(Exporter);
our @EXPORT_OK = qw(
  InThai InThaiCons InThaiHCons InThaiMCons InThaiLCons InThaiVowel 
  InThaiPreVowel InThaiPostVowel InThaiCompVowel InThaiDigit InThaiTon
+e
  InThaiPunct
);

our %EXPORT_TAGS = ( 'all' => [ @EXPORT_OK ] );

=head1 NAME

Thai -  useful character properties for Unicode Thai

=head1 SYNOPSIS

 use Regexp::CharProps::Thai;

 $char = "...";  # some UTF8 string

 $char =~ /\p{InThaiCons}/;  # match a Thai consonant
 $char =~ /\p{InThaiTone}/;  # match a Thai tone mark


# see description for full set of terms

=head1 DESCRIPTION

This module supplements the Unicode character-class definitions with
special groups relevant to Thai linguistics.  The following classes 
are defined:

=over 4

=item InThai

Matches ALL characters in the Thai unicode code-point range.

=item InThaiCons

Matches Thai consonant letters, leaving out vowels, numerics, tone mar
+ks, etc.

=item InThaiVowel

Matches Thai vowels, including compounded and free-standing vowels.

NOTE: Exceptions here include several of the "consonants" which also s
+erve
as vowels: or-ang, yo-yak, double ro-reua, leut and reut, and wo-wen. 
+ 
These are included as vowels in this grouping to accept the widest pos
+sible 
definition, but cannot with certainty be determined by this to be in u
+se
as actual vowels in the instance of their identification here.

=item InThaiAlpha

Matches only the Thai alphabetic characters (consonants and vowels),
excluding all digits, tone marks, and punctuation marks.

=item InThaiTone

Matches only the Thai tone marks, leaving out all letters,
digits and punctuation marks.

=item InThaiPunct

Matches Thai punctuation characters, not including tone marks,
white space, digits or alphabetic characters, and not including
non-Thai punctuation marks (such as English [.,'"!?] etc.).

=item InThaiCompVowel

Matches only the Thai vowels which are compounded with a Thai consonan
+t,
and matching only the vowel portion of the compounded character.

=item InThaiPreVowel

Matches only the subset of vowels which appear _before_ the consonant 
with which they are associated (though in Thai they are sounded _after
+_ 
said consonant); this excludes all consonant-vowels and does not inclu
+de 
any of the compounded vowels.

=item InThaiPostVowel

Matches only the vowels which appear _after_ the consonant with which
they are associated; this excludes all consonant-vowels and does not 
include any of the compounded vowels.

=item InThaiHCons

Matches high-class Thai consonants.

=item InThaiMCons

Matches middle-class Thai consonants.

=item InThaiLCons

Matches low-class Thai consonants.

=item InThaiDigit

Matches Thai numerical digits only.

=back

=cut


sub InThai {
    return <<'END';
0E01 0E5B
END
}

sub InThaiCons {
    return <<'END';
0E01 0E2E
END
}

sub InThaiVowel {
    return join "\n", 
'0E30 0E45',
'0E47',#Thai semi-tone mark used above gor-gai in Thai "gor" (or)
'0E4D',
'0E22',#Thai consonant yo-yak can also be a vowel (like 'y' in English
+)
'0E2D',#Thai consonant or-ang can also be a vowel
'0E27',#Thai consonant wo-wen is only a vowel following mai han-akat
}


#+Thai::InThaiCons
#+Thai::InThaiVowel
sub InThaiAlpha {
    return <<'END';
0E01 0E2E
0E30 0E45
0E47
0E4D
0E22
0E2D
0E27
END
}

sub InThaiTone {
    return <<'END';
0E48 0E4B
END
}

sub InThaiPunct {
    return <<'END';
0E46
0E4C
0E4E
0E4F
0E5A
0E5B
END
}

sub InThaiCompVowel {
    return join "\n", 
'0E31',#Thai mai han-akat
'0E34',#Thai sara-i
'0E35',#Thai sara-ii
'0E36',#Thai sara-ue
'0E37',#Thai sara-uee
'0E38',#Thai sara-u
'0E39',#Thai sara-uu
'0E3A',#Thai phinthu
'0E47',#Thai semi-tone mark used above gor-gai in Thai "gor" (or)
}

sub InThaiPreVowel {
    return <<'END';
0E40 0E44
END
}

sub InThaiPostVowel {
    return <<'END';
0E45
0E30
0E32
0E33
END
}

sub InThaiHCons { 
    return <<'END';
0E02
0E03
0E09
0E10
0E16
0E1C
0E1D
0E28
0E29
0E2A
0E2B
END
}

sub InThaiMCons {
    return <<'END';
0E01
0E08
0E0E
0E0F
0E14
0E15
0E1A
0E1B
0E2D
END
}

#+Thai::InThaiCons
#-Thai::InThaiHcons
#-Thai::InThaiMCons
sub InThaiLCons {
    return <<'END';
0E04 0E07
0E0A 0E0D
0E11 0E13
0E17 0E19
0E1E 0E27
0E2C
0E2E
END
}

sub InThaiDigit {
    return <<'END';
0E50 0E59
END
}


=head1 AUTHOR

Erik Mundall 

=head1 COPYRIGHT

Copyright (C) 2015 Erik Mundall.  All Rights Reserved.

This is free software; you can redistribute it and/or modify it under
the same terms as Perl itself.

=cut

1;
[download]

Comment on Re^5: Namespace/advice for new CPAN modules for Thai & Lao ( Regexp::CharProps - User Defined Character Properties ) Select or Download Code

Replies are listed 'Best First'.
Re^6: Namespace/advice for new CPAN modules for Thai & Lao ( Regexp::CharProps - User Defined Character Properties ) by Polyglot (Chaplain) on Mar 24, 2015 at 05:00 UTC
"but he meant some unicode string" Yes. It definitely wouldn't work on an upper-ascii-type encoding such as Thai originally began with, without some form of encoding/decoding going on. I guess I put "UTF8" because that is what gets used most with Thai, and what I knew would work having developed strictly with that. I presume any Unicode type should work equally well, though I don't claim to be an expert on Unicode. In your code example: `print "\$_ has got Thai" if m{ \p{InThai} \|\p{InThaiCons} \|\p{InThaiHCons} \|\p{InThaiMCons} \|\p{InThaiLCons} \|\p{InThaiVowel} \|\p{InThaiPreVowel} \|\p{InThaiPostVowel} \|\p{InThaiCompVowel} \|\p{InThaiDigit} \|\p{InThaiTone} \|\p{InThaiPunct} }x;` [download] ...only the first item in the OR'ed list should ever see action. All of the subsequent categories are already "InThai", and the "InThai" token already comes standard with Perl, AFAIK (see pg. 172 of "Programming Perl, Third Edition"), so that code would do little to test additional functionality. If the first line (\p{InThai}) failed, none of the others should succeed either. NOTE: I've updated my list to reflect your proposed name, but I've adapted it slightly to one that seems a better fit to me. Blessings, ~Polyglot~	[reply] [d/l]
Re^7: Namespace/advice for new CPAN modules for Thai & Lao ( Regexp::CharClasses::Thai / Lingua::Thai::RegexpCharClasses ) by Anonymous Monk on Mar 24, 2015 at 08:03 UTC
In your code example: ...only the first item in the OR'ed list should ever see action... SYNOPSIS only shows whats possible, it can be repetitive and incorrect as long as the syntax is valid. And when the exports are few, might as well show them all instead of "..." Regexp::CharProps My suggestion was that you call yours Regexp::CharProps::Thai not Regexp::CharProps. Also to distribute a helper parent module Regexp::CharProps with it, so that others can add Regexp::CharProps::AnonyRands or whatever ... a new well named place for these definitions to live Regexp::Thai::CharClasses So are you're going to have more Thai Regexp's that aren't CharSlasses? I think you got that backwards, it should be Regexp::CharClasses::Thai :) Or it should go into Lingua::Thai::RegexpCharClasses? In case you're going to have more Lingua::Thai things that aren't RegexpCharClasses Yes. It definitely wouldn't work on an upper-ascii-type encoding such as Thai originally began with, without some form of encoding/decoding going on. I guess I put "UTF8" because that is what gets used most with Thai, and what I knew would work having developed strictly with that. I presume any Unicode type should work equally well, though I don't claim to be an expert on Unicode. Right :) the numbers are unicode code points , independent of encoding	[reply]
Re^8: Namespace/advice for new CPAN modules for Thai & Lao ( Regexp::CharClasses::Thai / Lingua::Thai::RegexpCharClasses ) by Polyglot (Chaplain) on Mar 24, 2015 at 09:32 UTC
SYNOPSIS only shows whats possible, it can be repetitive and incorrect as long as the syntax is valid. And when the exports are few, might as well show them all instead of "..." Thank you for the clarification. I guess I misunderstood the intent of that. As is obvious, I've never submitted a module before, so I appreciate your patience with me. My suggestion was that you call yours Regexp::CharProps::Thai not Regexp::CharProps. Ok, I fixed that. Also to distribute a helper parent module Regexp::CharProps with it, so that others can add Regexp::CharProps::AnonyRands or whatever ... a new well named place for these definitions to live I have no idea how to do this. So are you're going to have more Thai Regexp's that aren't CharSlasses? I think you got that backwards, it should be Regexp::CharClasses::Thai :) Looking at that module now, perhaps it could all just go into Regexp::CharClasses, but I'm not the developer for that, and when I looked at its code, it's done in a somewhat different style which is confusing to me. I don't see any logical difference between Regexp::CharClasses::Thai and Regexp::Thai::CharClasses, except that, to my understanding, the former would be inhibited by the fact another developer has already used the Regexp::CharClasses namespace. Am I missing something here? Blessings, ~Polyglot~	[reply]
Re^9: Namespace/advice for new CPAN modules for Thai & Lao ( Regexp::CharClasses::Thai / Lingua::Thai::RegexpCharClasses ) by Anonymous Monk on Mar 24, 2015 at 10:31 UTC
Re^10: Namespace/advice for new CPAN modules for Thai & Lao ( Regexp::CharClasses::Thai / Lingua::Thai::RegexpCharClasses ) by Polyglot (Chaplain) on Mar 24, 2015 at 13:22 UTC
Some notes below your chosen depth have not been shown here