kcott has asked for the wisdom of the Perl Monks concerning the following question:

G'day All,

I was absolutely certain that I'd seen a regex for checking module names somewhere in https://perldoc.perl.org/perl but, after checking through many documentation pages, I cannot locate it.

I came up with:

qr{^[A-Za-z_][A-Za-z0-9_]*(?:::[A-Za-z0-9_]+)*$}

I'm aware that only has the "::" separator; not the (ancient) "'" separator.

I copied a number of module names from perlmodlib — which I hope covers all cases — and tested with:

#!/usr/bin/env perl use strict; use warnings; use Test::More; my @modules = qw{ strict autodie::exception::system utf8 B App::Prove::State::Result::Test CPAN::Meta::History::Meta_1_0 Encode::KR::2022_KR }; plan tests => 0+@modules; my $re = qr{^[A-Za-z_][A-Za-z0-9_]*(?:::[A-Za-z0-9_]+)*$}; for my $mod (@modules) { is $mod =~ $re, !!1, "Testing $mod"; }

Output:

1..7 ok 1 - Testing strict ok 2 - Testing autodie::exception::system ok 3 - Testing utf8 ok 4 - Testing B ok 5 - Testing App::Prove::State::Result::Test ok 6 - Testing CPAN::Meta::History::Meta_1_0 ok 7 - Testing Encode::KR::2022_KR

If anyone knows about an official regex for this, please let me know.

If I am just imagining that I saw such a regex at some point in the past, I'll use what I have here. In that case, I'd appreciate knowing about any edge cases that I might have missed (or other improvements).

— Ken

Replies are listed 'Best First'.
Re: Is there an official regex for checking module names? (updated x2)
by haukex (Archbishop) on Feb 08, 2022 at 09:01 UTC

    Interesting question! It appears that none of perlmod and the various documents it references seem to give any such restrictions; there's only The Syntax of Variable Names and Identifier parsing, the latter of which gives / (?[ ( \p{Word} & \p{XID_Start} ) + [_] ]) (?[ ( \p{Word} & \p{XID_Continue} ) ]) * /x for Unicode and / (?aa) (?!\d) \w+ /x for ASCII, and indeed, under use utf8;, package Σäೡↈ is allowed. However, when it comes to .pm files, the issue arises in the filesystem, and that's a can of worms across different OSes. (Update 2: On my Linux system, a file named Σäೡↈ.pm works fine as a regular module, I can use it normally.)

    Over in this node, I mentioned several different modules that parse .pm files: cpanm uses Parse::PMFile, which borrows this code from the PAUSE indexer, and they use ([\w\:\']+) to extract package names and the following to check if module names are bad:

    $package !~ /^\w[\w\:\']*\w?\z/ || $package !~ /\w\z/ || $package =~ /:/ && $package !~ /::/ || $package =~ /\w:\w/ || $package =~ /:::/

    Module::Info's version and ExtUtils::MM_Unix share the regex /\w[\w\:\']*/.

    But what is I suspect the "most official" answer comes from CPAN::Meta::Validator:

    /^[A-Za-z0-9_]+(::[A-Za-z0-9_]+)*$/

    And indeed, all of the entries in 02packages.details.txt match this pattern, the only exceptions currently being WWW::Scripter'_about_protocol and WWW::Scripter'Location, and changing :: to (?:'|::) in the above regex matches those too. Under versions of Perl that support /aa (5.14+), it can also be written /^\w+(?:(?:'|::)\w+)*$/aa.

    There's an interesting caveat here: This regex allows package names that begin with digits, which is not a valid Perl identifier. Indeed, package 123; is a syntax error - but package Acme::123; is not, and the existence of e.g. Acme::123 and many others on the CPAN confirms that this appears to be allowed. (Side note: as of 5.18, any non-empty string is allowed on the LHS of ->.)

    Update: All packages on the CPAN also match the slightly more restrictive /^(?!\d)\w+(?:(?:'|::)\w+)*$/aa, which also lines up with the blog post by schwern the AM linked to - and your own regex! Additional minor edits. Update 2: And of course Module::Runtime, as mentioned by the AM, agrees too.

        Anonymous mentioned this below (amid a massive pile of links) but the Module::Runtime module gives you all the handy tools you need for this. It also has a discussion of why it doesn't allow unicode names.
Re: Is there an official regex for checking module names?
by LanX (Saint) on Feb 08, 2022 at 01:37 UTC
    Nitpicking alert:

    You seem to parse packages. But modules are files.

    There is a convention to have the whole module included inside an equally named package, but with no guaranty it's respected.

    I'm not sure, but it might be possible to require a file which doesn't match your regex for identifiers.

    update

    more nitpicking:

    use utf8 allows extended identifiers, and my test allows umlauts in use MODULE and package NAME

    use strict; use warnings; use lib "."; use utf8; use äöü;

    # FILE äöü.pm use utf8; package äöü; use strict; use warnings; use Data::Dump qw/pp dd/; warn "äöü included"; 1;
    output
    C:/Strawberry/perl/bin\perl.exe -w d:/tmp/pm/mod_uni.pl äöü included at äöü.pm line 10. Compilation finished at Tue Feb 8 02:46:11

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      G'day Rolf,

      ++ All valid points.

      This was for a $work project and there are certain constraints: one package per module; A::B::C is in .../A/B/C.pm; non-ASCII characters disallowed. I wrote the regex with those constraints in mind.

      — Ken

Re: Is there an official regex for checking module names?
by ikegami (Patriarch) on Feb 08, 2022 at 15:04 UTC

    Technically, you can create packages and variables with pretty much any name. But let's assume you are referring to the ones you could use without taking special measures. Your pattern has at least four problems:

    $ perl -we'package ::' && echo ok ok
    $ perl -we'package _a;' && echo ok ok
    $ perl -Mutf8 -we'package é;' && echo ok ok
    $ perl -we'package aaaa::::::bbbb;' && echo ok ok

    The first one is occasionally used. It refers to the root package (for which main is an alias), so %:: is Perl's symbol table, $_ means $::_, etc.

    I haven't seen any instances of the second and third, but there's nothing overly special about them.

    I wouldn't consider the final one valid (even though it was accepted).

      G'day ikegami,

      Thanks for the feedback. As mentioned, this is for a $work project, which has various contraints, so some of these "exotic" forms wouldn't come up anyway. Quickly working through the list:

      1. I'm familiar with "::" but it wouldn't pass code review (cleverness reducing readability). It would need to be changed to "package main".
      2. While "_a" probably wouldn't pass code review (non-meaningful name) it does match the pattern: I added it the original code and got "ok 8 - Testing _a". I do use names with a leading undersore in t/name.t scripts: typically prepending Some::Work::Module with _Test::, _Mock::, or similar.
      3. As already stated, non-ASCII names are disallowed ($work constraint).
      4. Like you, I wouldn't have considered "aaaa::::::bbbb" to be valid. It wouldn't pass code review. Possibly flagged with "too damn weird; are you drunk?". :-)

      — Ken

        I'm familiar with "::" but it wouldn't pass code review (cleverness reducing readability)

        No, it's not being clever.

        Do you write package main::Foo::Bar; or package Foo::Bar;? So why do you expect me to use %main:: instead of %::? If I want the root namespace, that's what I use. Not some alias created so you can say "scripts run in main".

        Also, using Foo::->method instead of Foo->method solves a real problem. Again, not cleverness.

        While "_a" probably wouldn't pass code review (non-meaningful name) it does match the pattern

        oops! I saw the pattern for the lead character was shorter, and I somehow imagined that "_" wasn't included.

        As already stated, non-ASCII names are disallowed ($work constraint).

        That was not mentioned in the question. And you're not the only person using PerlMonks.

      > I wouldn't consider the final one valid (even though it was accepted).

      It was accepted as package name, but as a module it would hit a wall, ° since :: are translated to path delimiter /

      DB<83> use a::::b Can't locate a//b.pm in @INC

      I'm not sure if there are any file-systems with semantics for an unnamed directory between two '/', win at least gives me a hard time trying to create a directory with an empty string as name.

      But I'm wondering if this might have some relevance as vulnerability, since I remember seeing sequences of // as special syntax for network resources.

      update

      °) It's possible on win at least, because it translates multiple / to no-ops, hence a//b is the same like a/b

      so I created

      C:\tmp\x\y>echo warn 'inside';1 >z.pm

      started the debugger and ran

      DB<5> use lib '.' DB<6> use x::::y::::z inside at x//y///z.pm line 1.

      It also reveals a trick to force a recompile of a module by adding more :: to the path.

      Like that Perl won't find it cached in %INC and try to require it again.

      DB<7> use x::::y::::z DB<8> use x::::y::::::z inside at x//y///z.pm line 1. at x//y///z.pm line 1.

      DISCLAIMER: Too lazy to test this with linux (shame on me) and the rest of perlport... (no amiga handy...)

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

        At least on Linux, for existing modules it works as if just one pair of colons was specified:
        main::(-e:1): 1 DB<1> use Time::::Piece;

        But it loads the package that's declared in the file under its correct name:

        DB<2> x Time::::Piece::localtime->ymd; Can't locate object method "ymd" via package "Time::::Piece::localtime +" (perhaps you forgot to load "Time::::Piece::localtime"?) at (eval 7 +)[/usr/lib/perl5/5.26.1/perl5db.pl:738] line 2. DB<3> x Time::Piece::localtime->ymd; 0 '2022-02-09'

        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

        I remember seeing sequences of // as special syntax for network resources.

        In Windows // aka \\ is significant at the start of a path.

        • \\server\share
          • \\localhost\c$ (C:\)
          • \\wsl
          • \\server\pipe\name
        • \\?\C:\foo Long path
        • more

        Too lazy to test this with linux

        // is the same as / there.

Re: Is there an official regex for checking module names?
by Anonymous Monk on Feb 08, 2022 at 01:41 UTC