in reply to Re^2: Listing out the characters included in a character class [wide character warning]
in thread Listing out the characters included in a character class

Incidentally, I am no longer using use Test::More;. I discovered that it was the source of all of my errors, including all of the "wide character" log messages, and my code is working well now without it--zero errors being logged. Apparently, Test::More was not designed to be compatible with unicode characters, and is therefore not fit for purpose for my script.

I had planned to use it for the module testing, as is recommended in the guides for module preparation. Now, I'm not sure what to do. How can one go about testing his own script without this module? More importantly, how does one ensure that the installation will not fail in the absence of such a testing environment?

I'm nearly ready to wrap up with the creation of the module but for details of this nature. Packaging for CPAN is a bit cumbersome--at least for the first time around while learning the ropes.

Blessings,

~Polyglot~

  • Comment on Re^3: Listing out the characters included in a character class [wide character warning]
  • Select or Download Code

Replies are listed 'Best First'.
Re^4: Listing out the characters included in a character class [wide character warning]
by eyepopslikeamosquito (Archbishop) on Nov 03, 2023 at 06:45 UTC

    I am no longer using use Test::More;. I discovered that it was the source of all of my errors

    Perl testing is based on the Test Anything Protocol, which is also used to test languages other than Perl ... so there is no requirement for your module to use Test::More.

    That said, it would be nice if you could provide us with a SSCCE that clearly illustrates the problems you were experiencing with Test::More.

    See also: hippo's excellent Basic Testing Tutorial

    👁️🍾👍🦟
      Unfortunately, I do not think it is possible to provide a correct SSCCE here for this case. The forum converts all of the characters which are related to the problem to HTML-entities, and I am unaware of a method by which the actual files could be attached.

      Suffice it for now that the issue is caused by UTF8 embedded in the code and tested by the Test::More tests. Without a way to paste in actual code containing UTF8 characters, unmangled, I see no point in going to the trouble of forming up an SSCCE for this case. I doubt it would be likely to exhibit the same behaviors, post-transfer/conversion, and would thus prove little.

      Blessings,

      ~Polyglot~

        The forum converts all of the characters which are related to the problem to HTML-entities, and I am unaware of a method by which the actual files could be attached.

        I think you mean, "if I use a <code> block, the forum converts all of the characters to HTML-entities. But if I use a <pre> block, it handles them correctly."

        paragraph: โมดูลนี้เป็นส่วนเสริมคำจำกัดความคลาสอักขระ

        pre: โมดูลนี้เป็นส่วนเสริมคำจำกัดความคลาสอักขระ
        code: &#3650;&#3617;&#3604;&#3641;&#3621;&#3609;&#3637;&#3657; &#3648;&#3611;&#3655;&#3609;&#3626;&#3656;&#3623; &#3609;&#3648;&#3626;&#3619;&#3636;&#3617; &#3588;&#3635;&#3592;&#3635;&#3585;&#3633;&#3604;&#3588; &#3623;&#3634;&#3617;&#3588;&#3621; &#3634;&#3626;&#3629;&#3633;&#3585;&#3586;&#3619;&#3632;

        In fact, kcott already explained the <pre> vs <code> and you even made use of it in one of your earlier posts (which is where, btw, I grabbed my test string from). Thus, I'm not sure why you're now backpedaling and claiming that you cannot figure out how to share code that includes Unicode characters in the source or output: just use the <pre> like you did earlier.

Re^4: Listing out the characters included in a character class [wide character warning]
by kcott (Archbishop) on Nov 03, 2023 at 09:58 UTC

    "Wide character in ..." is a warning. See "perldiag: Wide character in %s". Please stop calling it an error.

    You showed this warning when using Test::More:

    Wide character in print at /.../Test2/Formatter/TAP.pm line 125.

    I simulated that warning when using Test::More:

    Wide character in print at /.../Test2/Formatter/TAP.pm line 156.

    The only difference being the line number which I'd guess, in the absence of other information, is due to you using a different version. Test::More and Test2::Formatter::TAP (along with many other modules) are part of the Test-Simple distribution. I'm using:

    $ perl -E 'use Test::More; say $Test::More::VERSION;' 1.302195 $ perl -E 'use Test2::Formatter::TAP; say $Test2::Formatter::TAP::VERS +ION;' 1.302195

    What version are you using?

    My line 156 looks like this:

    print $io $ok;

    What does your line 125 look like?

    I provided you with a solution to your problem by using:

    use open OUT => qw{:encoding(UTF-8) :std};

    Did you try that? If so, what was the outcome? If not, why not?

    The issue here is in no way specific to Test::More. Consider this code which generates the warning:

    $ perl -e '
        print "\N{DROMEDARY CAMEL}\n";
    '
    Wide character in print at -e line 2.
    🐪
    

    And this code which does not:

    $ perl -e '
        use open OUT => qw{:encoding(UTF-8) :std};
        print "\N{DROMEDARY CAMEL}\n";
    '
    🐪
    

    — Ken

      Note that ordering is important:
      #!/usr/bin/perl
      use warnings;
      use strict;
      use utf8;
      
      use Test::More tests => 1;
      use open OUT => ':encoding(UTF-8)', ':std';
      
      is "kůň", 1, 'same';
      
      gives the warning, while
      #!/usr/bin/perl
      use warnings;
      use strict;
      use utf8;
      
      use open OUT => ':encoding(UTF-8)', ':std';
      use Test::More tests => 1;
      
      is "kůň", 1, 'same';
      
      does not.

      That's why I recommended Test::More::UTF8. You can place it wherever you like and there are no warnings.

      Update: <code> to <pre> to fix the non-English characters.

      map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

        ++ Good to know. Thanks.

        Except when turning lexical pragmata on and off in limited scope, I always list pragmatic modules before non-pragmatic ones; so I never would have encountered that situation.

        — Ken

        I didn't realize that with the "use" pragma it would matter the order. I have sometimes even ordered all of them at the top of the script by alphabetic order. I was aware that the "require" usage could be sensitive to order of appearance. Always more to learn, thank you. I wonder how many times in the past this may have bitten me, with me oblivious to the cause.

        Blessings,

        ~Polyglot~

      Did you try that? If so, what was the outcome? If not, why not?

      Yes, I copied that into my code, replacing what I was doing (TMTOWTDI) and it made no difference. I had been using these lines already:

      binmode STDERR, ":utf8"; binmode STDIN, ":utf8"; binmode STDOUT, ":utf8";

      Honestly, there are so many ways in Perl of dealing with UTF8 that the mind spins--and they are not all created equal. I had not seen the particular method you recommended, but again, it turned out no different than what I had had in place already. If it does the same as the three lines, I may prefer it going forward. The three-line version does appear to have the advantage of being more specific, giving one the option of selecting among the three options. And I've, at various times, used other methods as well, including the Encode module, etc.

      What does your line 125 look like?

      print $io $msg;
      ...and my line 156 looks identical to yours.

      Regarding the last portion of your post, I know you are well-meaning so I will overlook how it comes across. My username is not without significance. From the very beginning of my Perl programming career, I have dealt with non-ASCII encodings (I was programming for Asian languages from the get-go). The "wide character" message is one I have seen thousands of times--and I well know its typical causes.

      Thank you for your help! (This is genuine, not being sarcastic--I just felt it necessary to clarify owing to the prior paragraph which might color the perception of my tone.)

      Blessings,

      ~Polyglot~

Re^4: Listing out the characters included in a character class [wide character warning]
by pryrt (Abbot) on Nov 03, 2023 at 13:36 UTC
    I discovered that it was the source of all of my errors

    The errors ("Premature end of script headers") are logged because the warnings are being printed before your HTTP headers, because while you took my advice to wrap the headers in a BEGIN block, you skipped the part of my advice where I explained that the BEGIN block might need to go before certain modules were even used. But getting that right is difficult, which is why I also suggested use CGI::Carp qw(fatalsToBrowser); (because that might help with your debug process).

    The warnings ("wide character") are because your script didn't set up the appropriate binmode/open-mode for all the various outputs that are used -- and other monks have given you better advice than I could on that, including Test::More::UTF8 , which I had never heard of, but will definitely keep in my arsenal going forward.

    I had planned to use it for the module testing

    I will admit that I haven't tested a CGI script, per se. But using Test::More inside the script that's generating the response to the browser seems weird to me. Normally tests are run from the command line (not on your live webserver), and the test script will call the various functions from the modules you wrote that your CGI script is calling. And, if you end up with a lot of logic/etc inside your CGI script that needs testing, you could even have a test that runs your CGI script (CGI can be run from the command line, without the involvement of the webserver -- even the old CGI.pm documentation explained how to do that). Or you could even have an HTTP client inside your test script, which would connect to the webserver to test the endpoints of your CGI (testing on your live server is probably not the best either, but you could have your test suite launch a private webserver instance on your test machine, without it being on your final webhost yet).

    But Test::More and the TAP protocol put some of the output (like the test name and ok/not-ok) to STDOUT, but also puts information (like the failure diagnostics) to STDERR -- and trying to properly handle the TAP output inside the CGI environment to generate a valid webpage seems tricky, at best, and I am convinced that (not a problem in Test::More itself) is the cause of your headaches.

      I had not intended the CGI script to be used for the official package testing--it was for my own testing, for matters of convenience at my end (think editing in TextWrangler instead of via 'nano' on the server). However, UTF8 characters are still UTF8 characters, whether printed from a CGI script or from one run at the command terminal. This should not matter at all.

      Neither had I ever heard of Test::More::UTF8. It's hard to use what is not known.

      Blessings,

      ~Polyglot~

Re^4: Listing out the characters included in a character class [wide character warning]
by choroba (Cardinal) on Nov 03, 2023 at 09:59 UTC
    Test::More is used widely. It's highly improbable it can cause any errors. Regarding the wide characters, maybe all you needed was Test::More::UTF8?

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      If the Test::More were already capable of handling UTF8 characters, why the need of Test::More::UTF8? And if, for a UTF8 application, the second option is to be used, why is this not made more prominent? I saw nothing about it in the documentation for Test::More, and it's hard to even know of its existence as advertised.

      In any case, it's nice to know that there was something better. At this point, I may just roll my own tests anyhow...we'll see. (I don't need anything overly complex to begin with.)

      Blessings,

      ~Polyglot~

      Well, I finally got around to the troublesome matter of the testing script for the module, and found I was forced to use one of the tools for testing...so, Test::More::UTF8 it was.

      ...until it wasn't.

      Something is different in the coding for the UTF8, it appears, and the script failed to run. Rather than more hours trying to troubleshoot what is unfamiliar to me, I've abandoned the /t folder for testing and gone the other route of having my own test.pl script in the main folder for the module. In that I use "Test::Simple" for a few simple tests, then run my own tests. It seems to work okay, but I wish it were better.

      #!/usr/bin/perl
      
      use strict;
      use warnings;
      use 5.008;
      use utf8;
      use FindBin qw($Bin);
      use lib "$Bin/lib";
      use blib './lib/';
      use Regexp::CharClasses::Thai;
      use Regexp::CharClasses::Thai qw(:all);
      
      
      binmode STDOUT, ':utf8';
      
      use Test::Simple 'no_plan';
      
      my $failure = 0;
      
      #########################
      # TEST ITEMS
      
      ok( q"'ก' =~ /\p{IsThai}/" );
      ok( q"'ก' =~ /\p{InThaiCons}/" );
      ok( q"'ก' =~ /\p{InThaiMCons}/" );
      
      is( q"'ก' =~ /\p{IsKokai}/",1,' Match for  "ก" =~ /\p{IsKokai}/');
      is( q"'ก' =~ /\p{InThai}/",1,' Match for  "ก" =~ /\p{InThai}/');
      is( q"'ก' =~ /\p{InThaiAlpha}/",1,' Match for  "ก" =~ /\p{InThaiAlpha}/');
      is( q"'ก' =~ /\p{InThaiCons}/",1,' Match for  "ก" =~ /\p{InThaiCons}/');
      isnt( q"'ก' =~ /\p{InThaiHCons}/",0,' No match for  "ก" =~ /\p{InThaiHCons}/');
      is( q"'ก' =~ /\p{InThaiMCons}/",1,' Match for  "ก" =~ /\p{InThaiMCons}/');
      isnt( q"'ก' =~ /\p{InThaiLCons}/",0,' No match for  "ก" =~ /\p{InThaiLCons}/');
      isnt( q"'ก' =~ /\p{InThaiDigit}/",0,' No match for  "ก" =~ /\p{InThaiDigit}/');
      isnt( q"'ก' =~ /\p{InThaiTone}/",0,' No match for  "ก" =~ /\p{InThaiTone}/');
      isnt( q"'ก' =~ /\p{InThaiVowel}/",0,' No match for  "ก" =~ /\p{InThaiVowel}/');
      isnt( q"'ก' =~ /\p{InThaiCompVowel}/",0,' No match for  "ก" =~ /\p{InThaiCompVowel}/');
      isnt( q"'ก' =~ /\p{InThaiPreVowel}/",0,' No match for  "ก" =~ /\p{InThaiPreVowel}/');
      isnt( q"'ก' =~ /\p{InThaiPostVowel}/",0,' No match for  "ก" =~ /\p{InThaiPostVowel}/');
      isnt( q"'ก' =~ /\p{InThaiPunct}/",0,' No match for  "ก" =~ /\p{InThaiPunct}/');
      is( q"'ก' =~ /\p{InThaiFinCons}/",1,' Match for  "ก" =~ /\p{InThaiFinCons}/');
      isnt( q"'ก' =~ /\p{InThaiMute}/",0,' No match for  "ก" =~ /\p{InThaiMute}/');
       
      
      is( q"'ไ' =~ /\p{InThai}/",1,' Match for  "ไ" =~ /\p{InThai}/');
      is( q"'ไ' =~ /\p{InThaiAlpha}/",1,' Match for  "ไ" =~ /\p{InThaiAlpha}/');
      is( q"'ไ' =~ /\p{InThaiWord}/",1,' Match for  "ไ" =~ /\p{InThaiWord}/');
      isnt( q"'ไ' =~ /\p{InThaiCons}/",0,' No match for  "ไ" =~ /\p{InThaiCons}/');
      isnt( q"'ไ' =~ /\p{InThaiHCons}/",0,' No match for  "ไ" =~ /\p{InThaiHCons}/');
      isnt( q"'ไ' =~ /\p{InThaiMCons}/",0,' No match for  "ไ" =~ /\p{InThaiMCons}/');
      isnt( q"'ไ' =~ /\p{InThaiLCons}/",0,' No match for  "ไ" =~ /\p{InThaiLCons}/');
      isnt( q"'ไ' =~ /\p{InThaiDigit}/",0,' No match for  "ไ" =~ /\p{InThaiDigit}/');
      isnt( q"'ไ' =~ /\p{InThaiTone}/",0,' No match for  "ไ" =~ /\p{InThaiTone}/');
      is( q"'ไ' =~ /\p{InThaiVowel}/",1,' Match for  "ไ" =~ /\p{InThaiVowel}/');
      isnt( q"'ไ' =~ /\p{InThaiCompVowel}/",0,' No match for  "ไ" =~ /\p{InThaiCompVowel}/');
      is( q"'ไ' =~ /\p{InThaiPreVowel}/",1,' Match for  "ไ" =~ /\p{InThaiPreVowel}/');
      isnt( q"'ไ' =~ /\p{InThaiPostVowel}/",0,' No match for  "ไ" =~ /\p{InThaiPostVowel}/');
      isnt( q"'ไ' =~ /\p{InThaiPunct}/",0,' No match for  "ไ" =~ /\p{InThaiPunct}/');
      is( q"'ไ' =~ /\p{IsSaraaimaimalai}/",1,' Match for  "ไ" =~ /\p{IsSaraaimaimalai}/');
      
      
          my $pv = 'ข่าวนี้ได้แพร่สะพัดออกไปอย่างรวดเร็ว';
          my $prevowel_syllables = $pv  =~ s/
                  (
                  (?:\p{InThaiPreVowel})
                  (?:
                    (?:\p{InThaiDualC1}\p{InThaiDualC2})
                    |
                    (?:\p{InThaiCons}){1}
                  )
                  (?:\p{InThaiTone}\p{InThaiCompVowel}\p{InThaiPostVowel}){0,3}
                    (?:
                      (?:\p{InThaiFinCons}\p{IsYoyak}\p{IsWowaen}){0,5}
                      (?!\p{InThaiPostVowel})
                    )*
                  (?:\p{InThaiMute})?
                  )           
                  /($1)/gx;
      
          print "Syllables with pre-vowels in 'ข่าวนี้ได้แพร่สะพัดออกไปอย่างรวดเร็ว' --> $pv: $prevowel_syllables\n";  # 4
      
      if ($prevowel_syllables == 4) { print "Syllables test succeeded.\n\n" } else { print "Syllables test FAILED.\n\n"; $failure++};
      
      if ($failure) {
      	print "No success: $failure tests failed.\n";
      	exit $failure;
      } else {
      	print "Success.  All tests passed.\n";
      	exit 0;
      };
      
      exit;
      
      
      sub is {
      my $test = shift @_;
      my $val = shift @_;
      my $say = shift @_;
      	print "TEST: $say\t";
      	if ((eval($test)) == $val) {
      		print "Passed in the affirmative.\n" 
      	} else { 
      		print "FAILED! INCORRECTLY NEGATIVE.\n";
      		$failure++ 
      	};
      };
      
      sub isnt {
      my $test = shift @_;
      my $val = shift @_;
      my $say = shift @_;
      	print "TEST: $say\t";
      	if (eval($test) != $val) { 
      		print "FAILED! INCORRECTLY AFFIRMATIVE.\n";
      		$failure++ 
      	} else { 
      		print "Passed in the negative.\n" 
      	};
      };
      
      
      EDIT: Cleaned it up a bit and changed to "pre" tags, hoping for better readability.

      Maybe someday I'll figure out the Test::More::UTF8. Until then, this approach will hopefully at least get the module installed.

      Blessings,

      ~Polyglot~

        > the script failed to run

        What happened? What errors did you get?

        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]