Polyglot has asked for the wisdom of the Perl Monks concerning the following question:

My attempt at launching my module belly-flopped. I've learned that even though Test::More is part of Perl core going back to v5.6.2, Test::More::UTF8 was never part of Perl's core. Furthermore, Test2, said to be unicode compatible, only made it to core in Perl 5.25.1.

My module does not need to use unicode. If the unicode comments and POD were stripped out of it, it would not even require "use utf8;" as there is no unicode in its actual code. But because it is for utf8 and the tests, to be meaningful, incorporate uft8 characters, the entire module fails to install on perl systems which do not already have the non-core module "Test::More::UTF8."

In order to make the installation as easy and fuss-free as possible, and more space-conserving, too, I don't want to require the installation of non-core modules. If there is not a better way to do this, my next version release will abandon all of those useful tests and put in one or more tests which require no unicode at all--just a "free pass" so to speak. This way, at least, the install would be able to complete.

So the question remains, is it even possible to create, and install, a module designed for Perl 5.8.3 compatibility, that provides UTF8 functions, and that does not require any non-core modules--and do this with reasonable UTF8-based tests?


UPDATE:

I think I've managed to find a hackish solution that should get me by for now. I've dumped once again (I should not have gone back to it after dropping it the first time) the "Test::More::UTF8" package earlier thought to be a "solution" for the lack of UTF8 compatibility of Test::More...it wasn't, and even when the package was manually installed (not part of core), it tended to generate some inexplicable 'wide character' warnings. So, I went with a pure-ASCII solution, no need even for "use utf8" in the testing script. Instead of testing on actual UTF8 characters, the script now tests on the hexadecimal codepoints returned from the module. If the correct codepoint is returned, the function can legitimately be considered to be installed and functioning. It would have been nice to test on real unicode, but oh well...the module will still work just fine, I'm sure.

Blessings,

~Polyglot~

  • Comment on How to create and install a module compatible with both UTF8 and Perl 5.8.3 without using non-core modules?

Replies are listed 'Best First'.
Re: How to create and install a module compatible with both UTF8 and Perl 5.8.3 without using non-core modules?
by pryrt (Abbot) on Dec 02, 2023 at 18:20 UTC
    So the question remains, is it even possible to create, and install, a module designed for Perl 5.8.3 compatibility, that provides UTF8 functions, and that does not require any non-core modules--and do this with reasonable UTF8-based tests?
    As I tried to make clear earlier, Test::More::UTF8 is just a helper, which effectively runs the single line of code
    BEGIN { binmode Test::More->builder->$_, ':utf8' for qw/failure_output + todo_output output/; }

    Here is an SSCCE which I ran in Strawberry Perl v5.8.8, so it should likely be compatible even with Perl v5.8.3 (which I don't have readily available for verification). It shows the wide-character warning if you run it with no command-line arguments, but will not warn if you give it a command line argument of "1" (or any other non-false value). It doesn't use any any non-core modules. All you have to do to make the same code work in general, without command line arguments, is to get rid of the if($ARGV[0]){} wrapper.

    use 5.008;
    use strict;
    use warnings;
    use Test::More tests => 1;
    use utf8;
    use open ':std', ':encoding(UTF-8)';
    use Encode;
    
    BEGIN {
        if($ARGV[0]) {
            binmode Test::More->builder->$_, ':utf8' for qw/failure_output todo_output output/;
        }
    }
    
    $| = 1;
    my $smile = "☺";
    
    diag "This smile $smile will ", ($ARGV[0]?'not ':''), "warn";
    
    is $smile, Encode::decode('UTF-8', "\xE2\x98\xBA"), "equivalent smiles";
    __END__
    
    C:> chcp 65001
    Active code page: 65001
    
    C:> perl pm11156039_testmoreutf8.pl
    1..1
    Wide character in print at C:/usr/local/apps/berrybrew/perls/5.8.8_32/perl/lib/Test/Builder.pm line 1275.
    # This smile ☺ will warn
    ok 1 - equivalent smiles
    
    
    C:> perl pm11156039_testmoreutf8.pl 1
    1..1
    # This smile ☺ will not warn
    ok 1 - equivalent smiles
    

    (used pre instead of code tags because of unicode characters)

Re: How to create and install a module compatible with both UTF8 and Perl 5.8.3 without using non-core modules?
by SankoR (Prior) on Dec 02, 2023 at 18:52 UTC
    I don't want to require the installation of non-core modules. If there is not a better way to do this, my next version release will abandon all of those useful tests

    You could pull this off with a little monkey patching and finagling but I think you're getting lost in the weeds again... Why on Earth would you avoid a module that has been in the perl core for almost eight years or make your test suite less complete for the arbitrary goal of supporting perl versions that are statistically irrelevant in 2023? Who are you targeting that stopped installing new versions of the interpreter 19 years and 11 months ago with v5.8.3? And if this person exists, why would they avoid updating perl itself-- ignoring two decades of improvements, security fixes, and features --only to install your brand new non-battle-hardened module tomorrow?

    And if a module was non-core 8 years ago (or even today), so what? Your module is non-core. You put that code on CPAN because it serves a practical purpose and you want your work to be found, installed, used, and potentially improved upon by others; not avoided or ignored because it's not a core module. The CPAN itself wouldn't make much sense if we all avoided using each other's non-core modules. Besides, worrying that someone doesn't have a module "new" to the core or an entirely non-core module (or even a given version of a core or non-core module) pre-installed is a problem resolved long ago: properly define your prerequisites in metadata and, better yet, in a cpanfile.

    Finally, the very first line of Test::More reads "STOP! If you're just getting started writing tests, have a look at Test2::Suite first." Emphasis is theirs. I'd take the advice because Test2 is where the effort is going today and in the future.

      My module is for Thai. My next one may well be for Lao. If you do not know the state of Thailand and Laos with respect to computing, you can be forgiven for not understanding my reasons for wanting to do what I am doing. That said, I did not ask for opinions on my rationale--I asked, perhaps rhetorically, if there were even a way to accomplish this in the "politically correct" (properly tested) manner with utf8. With the gerry-rigging/hackish ways to do this brought up, it's clear that Perl is a bit behind on the unicode adoption spectrum. It should not be this difficult.

      FYI: As of about six years ago when I did my research on the subject, Laos was estimated to have about 15% internet saturation; i.e. 15% of the population of the country had internet access. Most of that was via smartphones, so consider that far fewer have actual computers. Thailand is more advanced, but not as advanced as one might wish. While Thailand is not on the United Nations' "least developed countries" list as Laos is, it has miles to go in terms of educating people with computing. Thai programmers are few.

      As it happens, only two weeks ago I was in a meeting with a number of Thai people trying to persuade them to convert to using UTF8, across the board, for their translation projects. It was a tough sell. They were quite accustomed to typing in their text using the local ASCII code pages and fonts tailored for them--fonts which, when copied into a text file, disappear, leaving the resultant text looking like garbledy-gook. It was only after we showed them superior tools for word-wrapping that they warmed up to the idea of switching to UTF8. Transitions here seem to take longer than they might in other places.

      Blessings,

      ~Polyglot~

        I did not ask for opinions on my rationale
        I thought since you posted this in public that you might want the public to respond. My bad, I guess.

        But get ready because I'm going to make the same mistake again.

        What ratio of Laotian or Thai users are developing new software on Windows XP? Or are they still targeting a linux kernel that predates the first release of Ubuntu? Because that's the era of perl 5.8.3. Today, P5P "officially" covers two stable releases according to perlpolicy; that's currently this year's 5.38.x release and perl 5.36.x from 2022. The perl toolchain folks (those behind core modules like CPAN.pm, ExtUtils::MakeMaker, etc.) have set their support window to 10 years which will put perl 5.20.x at the tail end of targeted support next summer. I imagine they've done their research as well.

        Anyway, do as you please (as I have here) but watering down tests, cribbing snippets of code to avoid installing pure perl prerequisite modules, ignoring all the encoding work done in perl itself since 2004... in general, just making the maintenance and development of your module more complicated for a dev environment that might not exist anywhere in the world... is a choice. มันไม่มีอะไร...