Content-Language: utf8;Checking the Thai module
ok 1 - use RegexpCharClassesThai;Positives...
ok 2 - Match for "ก" =~ /\p{IsKokai}/
not ok 3 - Match for "ก" =~ /\p{InThaiMCons}/
Negatives...
ok 4 - No match for "ก" =~ /\p{InThaiHCons}/
ok 5 - No match for "ก" =~ /\p{InThaiLCons}/
ok 6 - No match for "ก" =~ /\p{InThaiVowel}/
ok 7 - No match for "ก" =~ /\p{InThaiPreVowel}/
Positives...
not ok 8 - Match for "ไ" =~ /\p{InThaiVowel}/
ok 9 - Match for "ไ" =~ /\p{InThaiPreVowel}/
Negatives...
ok 10 - No match for "ไ" =~ /\p{InThaiHCons}/
ok 11 - No match for "ไ" =~ /\p{InThaiMCons}/
ok 12 - No match for "ไ" =~ /\p{InThaiLCons}/
Check:
PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin
INC: /var/www/lib/ /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.34.0 /usr/local/share/perl/5.34.0 /usr/lib/x86_64-linux-gnu/perl5/5.34 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.34 /usr/share/perl/5.34 /usr/local/lib/site_perl /etc/apache2
[Wed Nov 01 06:37:06.123037 2023] [core:error] [pid 754:tid 1397026418 +29440] [client 192.168.1.101:58127] Premature end of script headers: +test-thai-mod.pl [Wed Nov 01 06:37:06.123072 2023] [perl:warn] [pid 754:tid 13970264182 +9440] /cgi/test-thai-mod.pl did not send an HTTP header Wide character in print at /usr/share/perl/5.34/Test2/Formatter/TAP.pm + line 125. # Failed test ' Match for "ก" =~ /\p{InThaiMCons}/<br>' # at /var/www/cgi/test-thai-mod.pl line 32. # got: '' # expected: '1' Wide character in print at /usr/share/perl/5.34/Test2/Formatter/TAP.pm + line 125. # Failed test ' Match for "ไ" =~ /\p{InThaiVowel}/<br>' # at /var/www/cgi/test-thai-mod.pl line 43. # got: '' # expected: '1' [Wed Nov 01 06:38:29.720987 2023] [core:error] [pid 753:tid 1397026418 +29440] [client 192.168.1.101:58138] Premature end of script headers: +test-thai-mod.pl [Wed Nov 01 06:38:29.721022 2023] [perl:warn] [pid 753:tid 13970264182 +9440] /cgi/test-thai-mod.pl did not send an HTTP header Wide character in print at /usr/share/perl/5.34/Test2/Formatter/TAP.pm + line 125. # Failed test ' Match for "ก" =~ /\p{InThaiMCons}/<br>' # at /var/www/cgi/test-thai-mod.pl line 32. # got: '' # expected: '1' Wide character in print at /usr/share/perl/5.34/Test2/Formatter/TAP.pm + line 125. # Failed test ' Match for "ไ" =~ /\p{InThaiVowel}/<br>' # at /var/www/cgi/test-thai-mod.pl line 43. # got: '' # expected: '1'
package RegexpCharClassesThai;
use 5.008003;
use strict;
use warnings;
use utf8;
use Exporter;
our @ISA = qw(Exporter);
our %EXPORT_TAGS = (
classes =>
[ qw(InThaiHCons InThaiMCons InThaiLCons
InThaiVowel InThaiPreVowel
IsThaiHCons IsThaiMCons IsThaiLCons
IsThaiVowel IsThaiPreVowel) ],
characters =>
[ qw(InKokai InKhokhai
IsKokai IsKhokhai ) ],
);
# add all the other ":class" tags to the ":all" class,
# deleting duplicates
{
my %seen;
push @{$EXPORT_TAGS{all}},
grep {!$seen{$_}++} @{$EXPORT_TAGS{$_}} foreach keys %EXPORT_TAGS;
}
our @EXPORT_OK = ( @{ $EXPORT_TAGS{'all'} } );
our @EXPORT = ( @{ $EXPORT_TAGS{'classes'} } );
our $VERSION = '1.00';
#--------------------------------------------------------------
# CREATE FUNCTIONALITY FOR SHOWING CONTENTS OF EACH CLASS
#--------------------------------------------------------------
my %char_class_dispatch = (
InThaiHCons => \&InThaiHCons,
InThaiMCons => \&InThaiMCons,
InThaiLCons => \&InThaiLCons,
InThaiVowel => \&InThaiVowel,
InThaiPreVowel => \&InThaiPreVowel,
);
sub list {
my ($char_class) = @_;
unless (exists $char_class_dispatch{$char_class}) {
warn "Char class '$char_class' doesn't exist!\n";
return [];
}
return [
map chr hex, @{$char_class_dispatch{$char_class}->()}
];
}
#--------------------------------------------------------------
# Start with the "Is..." versions
#--------------------------------------------------------------
sub IsThaiHCons { #THAI HIGH-CLASS CONSONANTS
# ข ฃ ฉ ฐ ถ ผ ฝ ศ ษ ส ห
return qw{
0E02 0E03 0E09 0E10 0E16 0E1C 0E1D 0E28
0E29 0E2A 0E2B
}
}
sub IsThaiMCons { #THAI MID-CLASS CONSONANTS
# ก จ ฎ ฏ ด ต บ ป อ
return qw{
0E01 0E08 0E0E 0E0F 0E14 0E15 0E1A 0E1B
0E2D
}
}
sub IsThaiLCons { #THAI LOW-CLASS CONSONANTS
# ค ฅ ฆ ง ช ซ ฌ ญ ฑ ฒ ณ ท ธ น พ ฟ ภ ม ย ร ฤ ล ฦ ว ฬ ฮ
return qw{
0E04 0E05 0E06 0E07 0E0A 0E0B 0E0C 0E0D
0E11 0E12 0E13 0E17 0E18 0E19 0E1E 0E1F
0E20 0E21 0E22 0E23 0E24 0E25 0E26 0E27
0E2C 0E2E
}
}
sub IsThaiVowel { #THAI VOWELS
#NOTE: 0E4D combines with a consonant but may not be considered a vowel
# ย ฤ ฦ ว อ ะ ั า ํา ิ ี ึ ื ุ ู ฺ เ แ โ ใ ไ ๅ ็ ํ
return qw{
0E22 0E24 0E26 0E27 0E2D 0E30 0E31 0E32
0E33 0E34 0E35 0E36 0E37 0E38 0E39 0E3A
0E40 0E41 0E42 0E43 0E44 0E45 0E47 0E4D
}
}
sub IsThaiPreVowel { #VOWELS PRECEDING CONSONANT
# เ แ โ ใ ไ
return qw{
0E40 0E41 0E42 0E43 0E44
}
}
#--------------------------------------------------------------
# Alias the "In..." forms (same as above)
#--------------------------------------------------------------
sub InThaiHCons { &IsThaiHCons };
sub InThaiMCons { &IsThaiMCons };
sub InThaiLCons { &IsThaiLCons };
sub InThaiVowel { &IsThaiVowel };
sub InThaiPreVowel { &IsThaiPreVowel };
#--------------------------------------------------------------
# Provide spelled-out forms of the individual characters
#--------------------------------------------------------------
sub IsKokai { return '0E01' } # ก - THAI CHARACTER KO KAI
sub IsKhokhai { return '0E02' } # ข - THAI CHARACTER KHO KHAI
#--------------------------------------------------------------
# Alias the spelled-out individual characters
#--------------------------------------------------------------
sub InKokai { &IsKokai }
sub InKhokhai { &IsKhokhai }
1;
__END__
=pod
=encoding utf8
=head1 DESCRIPTION
This module supplements the UTF-8 character-class definitions
available to regular expressions (regex) with special groups
relevant to Thai linguistics. The following classes are defined:
โมดูลนี้เป็นส่วนเสริมคำจำกัดความคลาสอักขระ UTF-8
ใช้ได้กับนิพจน์ทั่วไป (regex) ด้วยกลุ่มพิเศษ
ที่เกี่ยวข้องกับภาษาศาสตร์ไทย มีการกำหนดคลาสต่อไปนี้:
=over 4
=item InThaiVowel / IsThaiVowel
Matches Thai vowels only, including compounded and free-standing vowels.
Exceptions here include several of the "consonants" which also serve as
vowels: or-ang, yo-yak, double ro-reua, leut and reut, and wo-wen.
NOTE: Thai vowels cannot stand alone: they are always connected with a
consonant. Many of these, without their consonant companions, will appear
with the unicode dotted-circle character (U+25CC) when rendered, showing
a character is missing. Conversely, Thai consonants can exist without a
vowel, and some Thai words do not have written vowels (the vowel is implied).
=item InThaiPreVowel / IsThaiPreVowel
Matches only the subset of vowels which appear _before_ the consonant
with which they are associated (though in Thai they are sounded _after_
said consonant); this excludes all consonant-vowels and does not include
any of the compounded vowels.
=item InThaiHCons / IsThaiHCons
Matches Thai high-class consonants.
=item InThaiMCons / IsThaiMCons
Matches Thai middle-class consonants.
=item InThaiLCons / IsThaiLCons
Matches Thai low-class consonants.
=back
=cut
#!/usr/bin/perl
#TEST THAI MODULE
use strict;
use warnings;
use lib '/var/www/lib/';
use RegexpCharClassesThai;
use RegexpCharClassesThai qw( :all );
use utf8;
use Test::More;
binmode STDERR, ":utf8";
binmode STDIN, ":utf8";
binmode STDOUT, ":utf8";
BEGIN {
print "Content-Type:text/html; charset=utf-8\n";
print "Content-Language: utf8;\n\n";
}
print <<PAGE;
<html lang="utf8">
<body>
<h3>Checking the Thai module</h3>
PAGE
use_ok('RegexpCharClassesThai');
print "<h5>Positives...</h5>";
is( 'ก' =~ /[\p{IsKokai}]/,1,' Match for "ก" =~ /\p{IsKokai}/<br>');
is( 'ก' =~ /\p{InThaiMCons}/,1,' Match for "ก" =~ /\p{InThaiMCons}/<br>');
#PRODUCES ERROR, STOPPING CODE EXECUTION
#is( 'ก' =~ /\p{InThaiNonexistent}/,1,' Match for "ก" =~ /\p{InThaiFinCons}/<br>');
print "<h5>Negatives...</h5>";
isnt( 'ก' =~ /\p{InThaiHCons}/,1,' No match for "ก" =~ /\p{InThaiHCons}/<br>');
isnt( 'ก' =~ /\p{InThaiLCons}/,1,' No match for "ก" =~ /\p{InThaiLCons}/<br>');
isnt( 'ก' =~ /\p{InThaiVowel}/,1,' No match for "ก" =~ /\p{InThaiVowel}/<br>');
isnt( 'ก' =~ /\p{InThaiPreVowel}/,1,' No match for "ก" =~ /\p{InThaiPreVowel}/<br>');
print "<h5>Positives...</h5>";
is( 'ไ' =~ /\p{InThaiVowel}/,1,' Match for "ไ" =~ /\p{InThaiVowel}/<br>');
is( 'ไ' =~ /\p{InThaiPreVowel}/,1,' Match for "ไ" =~ /\p{InThaiPreVowel}/<br>');
print "<h5>Negatives...</h5>";
isnt( 'ไ' =~ /\p{InThaiHCons}/,1,' No match for "ไ" =~ /\p{InThaiHCons}/<br>');
isnt( 'ไ' =~ /\p{InThaiMCons}/,1,' No match for "ไ" =~ /\p{InThaiMCons}/<br>');
isnt( 'ไ' =~ /\p{InThaiLCons}/,1,' No match for "ไ" =~ /\p{InThaiLCons}/<br>');
print <<PAGE;
<h3>Check:</h3>
<p>PATH: $ENV{PATH}</p>
<p>INC: @INC</p>
</body>
</html>
PAGE
As the script output to the browser indicates, there is a problem with the "Positives": one works, the other does not in both the consonant and the vowel cases. If the module were not properly read ("used"), the errors would stop code execution. But the module is being read, and, to my eye, the subroutines of the working and non-working rules both follow the same style. I have no idea what more could be done to fix the ones that are not working.
All this just goes to show how truly "gifted" I am...the system is always "gifting" me with problems that no one else seems to be privileged to experience! (Now, perhaps some eagle-eyed coder will embarrass me by pointing out the most obvious of flaws...ha! And yet I should be most glad of it!)
Note that in this post everything is copy/pasted from the original (already trimmed) sources, with the only alterations being those required to format it for proper display here. In other words, if these scripts run on your server, then my server may have some issues. If, however, the problem is in the code itself, your server should reflect the same issues I'm seeing. (Encoding of the UTF8 characters may be an issue in proper transfer, however, as this site converts them to HTML-entities--why can't perlmonks.org be more up-to-date with encodings? /gripe.)
Blessings,
~Polyglot~
In reply to Re: Listing out the characters included in a character class
by Polyglot
in thread Listing out the characters included in a character class
by Polyglot
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |