in reply to Is there an official regex for checking module names?

Interesting question! It appears that none of perlmod and the various documents it references seem to give any such restrictions; there's only The Syntax of Variable Names and Identifier parsing, the latter of which gives / (?[ ( \p{Word} & \p{XID_Start} ) + [_] ]) (?[ ( \p{Word} & \p{XID_Continue} ) ]) * /x for Unicode and / (?aa) (?!\d) \w+ /x for ASCII, and indeed, under use utf8;, package Σäೡↈ is allowed. However, when it comes to .pm files, the issue arises in the filesystem, and that's a can of worms across different OSes. (Update 2: On my Linux system, a file named Σäೡↈ.pm works fine as a regular module, I can use it normally.)

Over in this node, I mentioned several different modules that parse .pm files: cpanm uses Parse::PMFile, which borrows this code from the PAUSE indexer, and they use ([\w\:\']+) to extract package names and the following to check if module names are bad:

$package !~ /^\w[\w\:\']*\w?\z/ || $package !~ /\w\z/ || $package =~ /:/ && $package !~ /::/ || $package =~ /\w:\w/ || $package =~ /:::/

Module::Info's version and ExtUtils::MM_Unix share the regex /\w[\w\:\']*/.

But what is I suspect the "most official" answer comes from CPAN::Meta::Validator:

/^[A-Za-z0-9_]+(::[A-Za-z0-9_]+)*$/

And indeed, all of the entries in 02packages.details.txt match this pattern, the only exceptions currently being WWW::Scripter'_about_protocol and WWW::Scripter'Location, and changing :: to (?:'|::) in the above regex matches those too. Under versions of Perl that support /aa (5.14+), it can also be written /^\w+(?:(?:'|::)\w+)*$/aa.

There's an interesting caveat here: This regex allows package names that begin with digits, which is not a valid Perl identifier. Indeed, package 123; is a syntax error - but package Acme::123; is not, and the existence of e.g. Acme::123 and many others on the CPAN confirms that this appears to be allowed. (Side note: as of 5.18, any non-empty string is allowed on the LHS of ->.)

Update: All packages on the CPAN also match the slightly more restrictive /^(?!\d)\w+(?:(?:'|::)\w+)*$/aa, which also lines up with the blog post by schwern the AM linked to - and your own regex! Additional minor edits. Update 2: And of course Module::Runtime, as mentioned by the AM, agrees too.

Replies are listed 'Best First'.
Re^2: Is there an official regex for checking module names?
by kcott (Archbishop) on Feb 08, 2022 at 21:21 UTC
      Anonymous mentioned this below (amid a massive pile of links) but the Module::Runtime module gives you all the handy tools you need for this. It also has a discussion of why it doesn't allow unicode names.
        > It also has a discussion of why it doesn't allow unicode names.

        Come on ... the world needs more modules like 💩::🤔::🤷🏽!

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery