in reply to Re^2: Listing out the characters included in a character class
in thread Listing out the characters included in a character class

In my last response, I believe I covered all of the coding issues. I finished with:

"There are a number of improvements you could make to the module code depending on the Perl version you're targeting. ... The code I've presented should, I believe, work fine with Perl 5.6 ..."

Perl does a great job of keeping up with Unicode versions. The latest Unicode version is 15.1; Perl v5.38 (the latest stable version) supports Unicode 15.0 (see "perl5380delta: Unicode 15.0 is supported"). Writing your code for Perl 5.6 may be insufficient to handle the Unicode support you need; look through the deltas to find the minimum Perl version for your needs.

Partly because it was a fun task for me, but also to show you some of the improvements you could get from a later version, here's the code rewritten for Perl v5.38 and Unicode 15.0.

New script and module:

ken@titan ~/tmp/pm_11155205_uni_char_class $ ls -l *3* -rw-r--r-- 1 ken None 993 Oct 29 05:03 PolyUniCharClass3.pm -rwxr-xr-x 1 ken None 344 Oct 29 05:03 uni_char_class_3.pl

uni_char_class_3.pl:

#!/usr/bin/env perl use v5.38; use open OUT => qw{:encoding(UTF-8) :std}; use lib '.'; # DEMO ONLY -- DON'T use in PRODUCTION! use PolyUniCharClass3; for my $prefix (qw{In Is If}) { for my $class (qw{H L M}) { my $cons = "${prefix}Thai${class}Cons"; say "$cons:"; say PolyUniCharClass3::list($cons)->@*; } }

PolyUniCharClass3.pm:

package PolyUniCharClass3; use v5.38; sub list ($char_class) { state $valid_char_class = {map +($_ => 1), qw{ InThaiHCons IsThaiHCons InThaiLCons IsThaiLCons }}; unless (exists $valid_char_class->{$char_class}) { warn "Char class '$char_class' doesn't exist!\n"; return []; } return [map chr, ThaiCons(substr $char_class, 2)->@*]; } sub ThaiCons ($cons) { state $code_ranges = { ThaiHCons => [qw{0E02-0E03 0E09 0E10 0E16}], ThaiLCons => [qw{0E04-0E07 0E0A-0E0D 0E11}], }; state $ThaiCons_expanded; return $ThaiCons_expanded->{$cons} //= _expand($code_ranges->{$con +s}); } sub _expand ($code_range_list) { state $re = qr{^([0-9A-Fa-f]+)-([0-9A-Fa-f]+)$}; my @full_list; for my $range ($code_range_list->@*) { if ($range =~ $re) { push @full_list, hex($1) .. hex($2); } else { push @full_list, hex $range; } } return [@full_list]; }

Output (unchanged):

ken@titan ~/tmp/pm_11155205_uni_char_class
$ ./uni_char_class_3.pl
InThaiHCons:
ขฃฉฐถ
InThaiLCons:
คฅฆงชซฌญฑ
InThaiMCons:
Char class 'InThaiMCons' doesn't exist!

IsThaiHCons:
ขฃฉฐถ
IsThaiLCons:
คฅฆงชซฌญฑ
IsThaiMCons:
Char class 'IsThaiMCons' doesn't exist!

IfThaiHCons:
Char class 'IfThaiHCons' doesn't exist!

IfThaiLCons:
Char class 'IfThaiLCons' doesn't exist!

IfThaiMCons:
Char class 'IfThaiMCons' doesn't exist!

There were a couple of points at the end of your post which I didn't address. Here goes:

"Regarding the use of <pre> tags, are they equivalent to the <code> tags?"

They sort of do the same job but have these differences:

"Incidentally, there will indeed also be an 'InThaiMCons' definition in this module (and more)!"

I picked the names like If* and *M* for my testing. Your test suite (t/*.t scripts) should check that both success and failure are handled appropriately.

— Ken

Replies are listed 'Best First'.
Re^4: Listing out the characters included in a character class [v5.38]
by Polyglot (Chaplain) on Oct 30, 2023 at 06:30 UTC
    Well, I've nearly finished polishing up the module itself--still some work to do on the testing script, but it is at least functional. The module, however, is not working properly on my machine, and produces failure messages in the logs. I have put the full code, as I intend soon to publish it anyhow, on my scratchpad: Polyglot's scratchpad.

    The errors I'm getting look like this:

    [Mon Oct 30 05:11:03.311339 2023] [core:error] [pid 188075:tid 1396602 +23264320] [client 192.168.1.101:53954] Premature end of script header +s: test-thai-mod.pl [Mon Oct 30 05:11:03.311358 2023] [perl:warn] [pid 188075:tid 13966022 +3264320] /cgi/test-thai-mod.pl did not send an HTTP header [Mon Oct 30 05:11:03.311388 2023] [:error] [pid 188075:tid 13966022326 +4320] Undefined subroutine &ModPerl::ROOT::ModPerl::PerlRun::var_www_ +cgi_test_2dthai_2dmod_2epl::IsThaiLCons called at /var/www/cgi/test-t +hai-mod.pl line 24.\n
    The "did not send an HTTP header" has nothing to do with the header, but with premature exiting of code execution due to other problems. The "Undefined subroutine" seems to be the issue, and I have no clue why. Once, with a similar error message, I restarted the apache2 server and all was well. But that no longer works on this new message. I am left not knowing whether my apache2 server is at issue, or whether it is this code--but probably the latter. There's certainly no point trying to publish code that is not first functional, so any help on this would be much appreciated.

    Blessings,

    ~Polyglot~

      "... any help on this would be much appreciated."

      That's nigh impossible without seeing /var/www/cgi/test-thai-mod.pl.

      ModPerl::ROOT::ModPerl::PerlRun::var_www_cgi_test_2dthai_2dmod_2epl looks like a very unusual module name. Perhaps a typo on "/var/www/cgi/test-thai-mod.pl line 24".

      When you post test-thai-mod.pl, the reason for showing Regexp::CharClasses::Thai on your scratchpad may become apparent: without further information, it seems irrelevant.

      — Ken

        I uncommented some of the test lines in my script, which adds more error messages. First the script (which has be with "pre" tags due to using UTF8 characters):
        #!/usr/bin/perl
        
        #TEST THAI MODULE
        
        use strict;
        use warnings;
        use lib '/';
        use lib '/var/www/lib/';
        #push @INC, '/var/www/lib/';
        use RegexpCharClassesThai;  #Regexp-CharClasses-Thai.pm
        use utf8;
        use Test::More;
        
        $|=1;
        
        	print "Content-Type:text/html; charset=utf-8\n";
        	print "Content-Language: utf8;\n\n";
        
        my @inthai = &IsThaiLCons;
        my $sample = 'โมดูลนี้เป็นส่วนเสริมคำจำกัดความคลาสอักขระ';
        
        my $part = $sample;
        $part =~ s/\p{IsThaiHCons}/H/g;
        
        print <<PAGE;
        
        
        <html lang="utf8">
        <body>
        
        <h3>Checking the Thai module</h3>
        PAGE
        
        use_ok('RegexpCharClassesThai');
        
        my $text = 'ข่าวนี้ได้แพร่สะพัดออกไปอย่างรวดเร็ว';
        my $syllables = 0;
        
        print "<p>Syllables: $syllables; $text\n";
        
        
        print "<p>Syllables: $syllables; $text\n";
        
        
        is( 'ก' =~ /\p{IsKokai}/,1,' Match for  "ก" =~ /\p{IsKokai}/<br>');
        is( 'ก' =~ /\p{InThai}/,1,' Match for  "ก" =~ /\p{InThai}/<br>');
        is( 'ก' =~ /\p{InThaiAlpha}/,1,' Match for  "ก" =~ /\p{InThaiAlpha}/<br>');
        is( 'ก' =~ /\p{InThaiCons}/,1,' Match for  "ก" =~ /\p{InThaiCons}/<br>');
        isnt( 'ก' =~ /\p{InThaiHCons}/,1,' No match for  "ก" =~ /\p{InThaiHCons}/<br>');
        is( 'ก' =~ /\p{InThaiMCons}/,1,' Match for  "ก" =~ /\p{InThaiMCons}/<br>');
        isnt( 'ก' =~ /\p{InThaiLCons}/,1,' No match for  "ก" =~ /\p{InThaiLCons}/<br>');
        isnt( 'ก' =~ /\p{InThaiDigit}/,1,' No match for  "ก" =~ /\p{InThaiDigit}/<br>');
        isnt( 'ก' =~ /\p{InThaiTone}/,1,' No match for  "ก" =~ /\p{InThaiTone}/<br>');
        isnt( 'ก' =~ /\p{InThaiVowel}/,1,' No match for  "ก" =~ /\p{InThaiVowel}/<br>');
        isnt( 'ก' =~ /\p{InThaiCompVowel}/,1,' No match for  "ก" =~ /\p{InThaiCompVowel}/<br>');
        isnt( 'ก' =~ /\p{InThaiPreVowel}/,1,' No match for  "ก" =~ /\p{InThaiPreVowel}/<br>');
        isnt( 'ก' =~ /\p{InThaiPostVowel}/,1,' No match for  "ก" =~ /\p{InThaiPostVowel}/<br>');
        isnt( 'ก' =~ /\p{InThaiPunct}/,1,' No match for  "ก" =~ /\p{InThaiPunct}/<br>');
        is( 'ก' =~ /\p{InThaiFinCons}/,1,' Match for  "ก" =~ /\p{InThaiFinCons}/<br>');
        isnt( 'ก' =~ /\p{InThaiMute}/,1,' Match for  "ก" =~ /\p{InThaiMute}/<br>');
         
        
        is( 'ไ' =~ /\p{InThai}/,1,' Match for  "ไ" =~ /\p{InThai}/<br>');
        is( 'ไ' =~ /\p{InThaiAlpha}/,1,' Match for  "ไ" =~ /\p{InThaiAlpha}/<br>');
        isnt( 'ไ' =~ /\p{InThaiCons}/,1,' No match for  "ไ" =~ /\p{InThaiCons}/<br>');
        isnt( 'ไ' =~ /\p{InThaiHCons}/,1,' No match for  "ไ" =~ /\p{InThaiHCons}/<br>');
        isnt( 'ไ' =~ /\p{InThaiMCons}/,1,' No match for  "ไ" =~ /\p{InThaiMCons}/<br>');
        isnt( 'ไ' =~ /\p{InThaiLCons}/,1,' No match for  "ไ" =~ /\p{InThaiLCons}/<br>');
        isnt( 'ไ' =~ /\p{InThaiDigit}/,1,' No match for  "ไ" =~ /\p{InThaiDigit}/<br>');
        isnt( 'ไ' =~ /\p{InThaiTone}/,1,' No match for  "ไ" =~ /\p{InThaiTone}/<br>');
        is( 'ไ' =~ /\p{InThaiVowel}/,1,' Match for  "ไ" =~ /\p{InThaiVowel}/<br>');
        isnt( 'ไ' =~ /\p{InThaiCompVowel}/,1,' No match for  "ไ" =~ /\p{InThaiCompVowel}/<br>');
        is( 'ไ' =~ /\p{InThaiPreVowel}/,1,' Match for  "ไ" =~ /\p{InThaiPreVowel}/<br>');
        isnt( 'ไ' =~ /\p{InThaiPostVowel}/,1,' No match for  "ไ" =~ /\p{InThaiPostVowel}/<br>');
        isnt( 'ไ' =~ /\p{InThaiPunct}/,1,' No match for  "ไ" =~ /\p{InThaiPunct}/<br>');
        is( 'ไ' =~ /\p{IsSaraaimaimalai}/,1,' Match for  "ไ" =~ /\p{IsSaraaimaimalai}/<br>');
        
        
        print "\n";
        
        
        print <<PAGE;
        <h3>Check:</h3>
        <p>InThai: @inthai</p>
        
        <p>Sample: $sample</p>
        <p>Part: $part</p>
        <p>PATH: $ENV{PATH}</p>
        <p>INC: @INC</p>
        </body>
        </html>
        
        PAGE
        
        
        
        
        

        And then the error messages...

        Prototype mismatch: sub ModPerl::ROOT::ModPerl::PerlRun::var_www_cgi_t +est_2dthai_2dmod_2epl::ok: none vs ($;$) at /usr/share/perl/5.34/Expo +rter.pm line 63. at /var/www/cgi/test-thai-mod.pl line 12. Prototype mismatch: sub ModPerl::ROOT::ModPerl::PerlRun::var_www_cgi_t +est_2dthai_2dmod_2epl::use_ok: none vs ($;@) at /usr/share/perl/5.34/ +Exporter.pm line 63. at /var/www/cgi/test-thai-mod.pl line 12. Prototype mismatch: sub ModPerl::ROOT::ModPerl::PerlRun::var_www_cgi_t +est_2dthai_2dmod_2epl::require_ok: none vs ($) at /usr/share/perl/5.3 +4/Exporter.pm line 63. at /var/www/cgi/test-thai-mod.pl line 12. Prototype mismatch: sub ModPerl::ROOT::ModPerl::PerlRun::var_www_cgi_t +est_2dthai_2dmod_2epl::is: none vs ($$;$) at /usr/share/perl/5.34/Exp +orter.pm line 63. at /var/www/cgi/test-thai-mod.pl line 12. Prototype mismatch: sub ModPerl::ROOT::ModPerl::PerlRun::var_www_cgi_t +est_2dthai_2dmod_2epl::isnt: none vs ($$;$) at /usr/share/perl/5.34/E +xporter.pm line 63. at /var/www/cgi/test-thai-mod.pl line 12. Prototype mismatch: sub ModPerl::ROOT::ModPerl::PerlRun::var_www_cgi_t +est_2dthai_2dmod_2epl::like: none vs ($$;$) at /usr/share/perl/5.34/E +xporter.pm line 63. at /var/www/cgi/test-thai-mod.pl line 12. Prototype mismatch: sub ModPerl::ROOT::ModPerl::PerlRun::var_www_cgi_t +est_2dthai_2dmod_2epl::unlike: none vs ($$;$) at /usr/share/perl/5.34 +/Exporter.pm line 63. at /var/www/cgi/test-thai-mod.pl line 12. Prototype mismatch: sub ModPerl::ROOT::ModPerl::PerlRun::var_www_cgi_t +est_2dthai_2dmod_2epl::cmp_ok: none vs ($$$;$) at /usr/share/perl/5.3 +4/Exporter.pm line 63. at /var/www/cgi/test-thai-mod.pl line 12. Prototype mismatch: sub ModPerl::ROOT::ModPerl::PerlRun::var_www_cgi_t +est_2dthai_2dmod_2epl::pass: none vs (;$) at /usr/share/perl/5.34/Exp +orter.pm line 63. at /var/www/cgi/test-thai-mod.pl line 12. Prototype mismatch: sub ModPerl::ROOT::ModPerl::PerlRun::var_www_cgi_t +est_2dthai_2dmod_2epl::fail: none vs (;$) at /usr/share/perl/5.34/Exp +orter.pm line 63. at /var/www/cgi/test-thai-mod.pl line 12. Prototype mismatch: sub ModPerl::ROOT::ModPerl::PerlRun::var_www_cgi_t +est_2dthai_2dmod_2epl::can_ok: none vs ($@) at /usr/share/perl/5.34/E +xporter.pm line 63. at /var/www/cgi/test-thai-mod.pl line 12. Prototype mismatch: sub ModPerl::ROOT::ModPerl::PerlRun::var_www_cgi_t +est_2dthai_2dmod_2epl::isa_ok: none vs ($$;$) at /usr/share/perl/5.34 +/Exporter.pm line 63. at /var/www/cgi/test-thai-mod.pl line 12. [Mon Oct 30 13:00:32.189740 2023] [core:error] [pid 188075:tid 1396601 +72908096] [client 192.168.1.101:55600] Premature end of script header +s: test-thai-mod.pl [Mon Oct 30 13:00:32.189767 2023] [perl:warn] [pid 188075:tid 13966017 +2908096] /cgi/test-thai-mod.pl did not send an HTTP header [Mon Oct 30 13:00:32.189819 2023] [:error] [pid 188075:tid 13966017290 +8096] Undefined subroutine &ModPerl::ROOT::ModPerl::PerlRun::var_www_ +cgi_test_2dthai_2dmod_2epl::IsThaiLCons called at /var/www/cgi/test-t +hai-mod.pl line 19.\n

        Blessings,

        ~Polyglot~