Find installed Perl modules matching a regular expression

Here is a handy command-line tool to quickly view installed Perl modules whose name matches a specified regular expression.

Features

Perl regular expression syntax, with separate case-sensitive switch.
Optional initialization file for faster look-ups.
Option to print the module name or the full directory path to the module file.
Option to display duplicate modules and other statistics.
Uses only core modules.

Other well-known methods

From perlfaq3: How do I find which modules are installed on my system?
The pminst script from the pmtools CPAN bundle
The CPAN module: HTML::Perlinfo

So... why another way to do it?

Simply put: I could not easily convince these other tools to Do What I Want, as quickly as I want, in (what I consider) a bug-free manner. Obviously, this is not a new idea; it is merely a different implementation. There are many threads here at the Monastery, as a Super Search would reveal, and I have probably read every node of every thread on the topic. I believe HTML::Perlinfo does everything this script does (and much, much more), except that I could not easily figure out how to generate output as simple text, rather than HTML. I consider HTML::Perlinfo to be a valuable companion to this script. I run a daily cronjob to dump out recent versions of both HTML and text.

Impatience

In my opinion, the biggest advantage here is the fast look-up capability. No matter how you slice it, you have to search through the @INC directories via some variant of find, which can take a whole minute or so -- I just do not have the patience to wait that long! Maintaining the initialization file avoids that nonsense.

The code

use warnings;
use strict;
use Getopt::Long;
use Pod::Usage;
use File::Find;

my $print_path;
my $report;
my $re;

parse_args();

# Clean up @INC
my @dirs;
for my $dirname (@INC) {
    if (-d $dirname) {
        next if $dirname eq '.';
        $dirname =~ s{/+}{/}g;
        $dirname =~ s{/$}{};
        push @dirs, $dirname;
    }
}
@dirs = uniq(@dirs);

# For quicker operation, use init file, if it exists
my @files;
my $use_find = 1;
my $message;
my $init_file = exists $ENV{HOME} ? "$ENV{HOME}/.findpm" : '';
if (-e $init_file) {
    if (open my $fh, '<', $init_file) {
        @files = <$fh>;
        close $fh;
        chomp @files;
        my $days = 1;
        if (-M $init_file > $days) {
            $message = "Warning: $init_file is older than $days day\n"
+;
        }
        die "Error: $init_file is empty" if -z $init_file;
        $use_find = 0;
    }
    else {
        $message = "Warning: $init_file exists, but can not be opened:
+ $!";
    }
}

# Otherwise, use the slower find command
if ($use_find) {
    # Find all .pm files under @INC dirs
    my @find_dirs = reduce_dirs(@dirs);
    find(
        {
            wanted => sub { push @files, $_ if -f $_ and /\.pm$/ },
            no_chdir => 1,
        },
        @find_dirs
    );
    @files = uniq(@files);
}

# Print those modules/files which match the regex
my %mods;
for my $file (@files) {

    my @ds;
    for my $dir (@dirs) {
        if (index($file, $dir) == 0) {
            #print "$d2 is a substring of $d1, starting at pos 0\n"
            push @ds, $dir;
        }
    }
    my $d = (sort {length($b) <=> length($a)} @ds)[0];
    my $rel = substr($file, (length($d)+1));
    my $name = $rel;
    $name =~ s/\.pm$//;

    next unless $name =~ /$re/;

    push @{ $mods{$rel} }, $d;
    if ($print_path) {
        print "$file\n";
    }
    else {
        $rel =~ s/\.pm$//;
        $rel =~ s{/}{::}g;
        print "$rel\n";
    }
}

if ($report) {
    my $num_dups = 0;
    for (keys %mods) {
        $num_dups++ if (scalar(@{$mods{$_}}) > 1);
    }

    if ($num_dups) {
        print "\nDUPLICATES\n";
        for my $rel (keys %mods) {
            if (scalar(@{$mods{$rel}}) > 1) {
                print "$rel\n";
                for my $dir (@{$mods{$rel}}) {
                    print "    $dir/$rel\n";
                }
            }
        }
    }

    print "\nSUMMARY\n";
    print "    regex = $re\n";
    print "    Used '$init_file' init file instead of 'find'\n" unless
+ $use_find;
    print "    INC dirs:\n";
    print "        $_\n" for @dirs;
    print '    Total              ".pm" files = ', scalar @files, "\n"
+;
    print '    Matching unique    ".pm" files = ', scalar keys %mods, 
+"\n";
    print '    Matching duplicate ".pm" files = ', $num_dups, "\n";
}

warn $message if $message;
exit;

sub reduce_dirs {
    # Reduce a list of directory names by eliminating
    # names which contain other names.  For example,
    # if the input array contains (/a/b/c/d /a/b/c /a/b),
    # return an array containing (/a/b).
    my @dirs = @_;
    my %substring_count = map { $_ => 0 } @dirs;

    for my $x (@dirs) {
        for my $y (@dirs) {
            next if $x eq $y;
            if (index($x, $y) == 0) {
                # if y is substring of x, starting at position 0
                $substring_count{$x}++;
            }
        }
    }

    my @dsubs;
    for (keys %substring_count) {
        push @dsubs, $_ if $substring_count{$_} == 0;
    }
    return @dsubs;
}

sub uniq {
    # From List::MoreUtils, $VERSION = '0.22'
    my %h;
    map { $h{$_}++ == 0 ? $_ : () } @_;
}

sub parse_args {
    my ($help, $sens);
    GetOptions(
        'sens'      => \$sens,
        'path'      => \$print_path,
        'report'    => \$report,
        'help'      => \$help
    ) or pod2usage();

    $help and pod2usage(-verbose => 2);

    my $pat = (@ARGV) ? shift @ARGV : '.';
    $pat =~ s{::}{/}g;
    $re = ($sens) ? qr/$pat/ : qr/$pat/i;
    #print "pat=$pat\n";
    #print "re=$re\n";#exit;

    @ARGV and pod2usage("Error: unexpected args: @ARGV");
}


=head1 NAME

B<findpm> - Find installed Perl modules

=head1 SYNOPSIS

findpm [options] [regex]

    Options:
    -help       verbose help
    -path       print out full directory paths also
    -report     print out detailed report
    -sens       case-sensitive [default is case-insensitive]

=head1 DESCRIPTION

Search through the directories in the Perl C<@INC> variable
for Perl module files (all files with a C<.pm> extension) matching
a specified regular expression.
The names of all the modules which match will be printed to STDOUT.

Any directories listed in C<@INC> which do not exist will be silently 
+ignored.
Excludes the current directory (.).

If you are impatient (like I am) you can optionally use an initializat
+ion
file instead of letting the script search through all the C<@INC>
directories every time you run the script.  The file must be in your h
+ome
directory and must be named C<.findpm>.  You must create this file you
+rself
(see EXAMPLES below), and you should keep it up to date.  Since you wi
+ll
get a warning if the init file is more than a day old, I recommend
creating the file using a cron job that runs once a day.  If the init 
+file
does not exist, the script will proceed to search C<@INC>.

=head1 ARGUMENTS

=over 4

=item regex

An optional regular expression may be given.  The regex may be a simpl
+e
string, such as C<foo>, or it may be a more complicated expression, su
+ch as
C<^foo.*bar\d>. The regex syntax is Perl; it should not be confused
with shell wilcard syntax or the syntax for other common Unix utilitie
+s,
such as I<sed> or I<grep>. It is best to quote the regex to prevent
interaction with the shell. Do not include the C<.pm> extension as par
+t of the
regex.  If no regex is given, find all modules.

=back

=head1 OPTIONS

All options can be abbreviated.

=over 4

=item sens

By default, the regular expression is case-insensitive. So, if the inp
+ut
regex is C<foo>, it will match C<foo> as well as C<FOO> and C<Foo>, et
+c.
To use case-sensitive, use the C<-sens> option.

    findpm -sens foo

=item path

By default, only the module name is printed. To instead print the full
directory path to the module file, use the C<-path> option.

    findpm -path foo

=item report

To print out additional statistics, use the C<-report> option.
This will show the total number of matching modules, duplicate modules
+, etc.

    findpm -report

=item help

Show verbose usage information.

=back

=head1 EXAMPLES

Find xml modules:

    findpm xml

Find modules with case-sensitive "Ext":

    findpm -sens Ext

Find modules like File::Find.  The following are equivalent because
C<::> will be converted to C</> (similar to I<perldoc>):

    findpm 'file::find'
    findpm 'file/find'

Find all modules in all C<@INC> directories:

    findpm

Create init file:

    rm -f ~/.findpm; findpm -path > /tmp/.findpm; mv /tmp/.findpm ~/.f
+indpm

=head1 CONFIGURATION AND ENVIRONMENT

Searches for an optional initialization file in the directory specifie
+d
by the C<HOME> environment variable:

    ${HOME}/.findpm

=head1 LIMITATIONS

The initialization file is only supported for Unix-type operating syst
+ems.

=cut
[download]

Constructive criticism, suggestions for improvements and bug reports are welcome.

Update: Now only uses core modules.
Update: Avoid potential warning; small change to POD.
Update: find is more portable.

Comment on Find installed Perl modules matching a regular expression Select or Download Code

Replies are listed 'Best First'.
Re: Find installed Perl modules matching a regular expression by Anonymous Monk on Sep 16, 2009 at 12:06 UTC
To get rid of the unix limitation you could die without $ENV{HOME}, or use File::HomeDir. Here is my caching version, works on ALL systems :) `echo pml is module names list pminst >pml echo pmlf is module filenames list pminst -l >pmlf echo pmlfl is name tabspace filename paste pml pmlf >pmlfl grep "^CGI::S[^:]$" pml grep "CGI/S[^/]$" pmlf grep -P "^CGI::S\w+$" pml grep -P "^CGI::S\w+\t" pmlfl perl -lne "print $_ if /^CGI(::\w+)$/" pml perl -lne "print $_ if m!CGI/S[^/]$!" pmlf perl -lne "print $_ if /^CGI(::\w+)\t/" pmlfl` [download] update*: Whoops, I just realized pminst is broken in 2 ways, This doesn't match `D:\>pminst Wx$` [download] And this matches prints MSWin32-x86-multi-thread `D:\>pminst Wx.pm$ Wx MSWin32-x86-multi-thread::Wx D:\>perl -le"print for @INC" C:/perl/5.10.1/lib/MSWin32-x86-multi-thread C:/perl/5.10.1/lib C:/perl/site/5.10.1/lib/MSWin32-x86-multi-thread C:/perl/site/5.10.1/lib .` [download]	[reply] [d/l] [select]
Re^2: Find installed Perl modules matching a regular expression by toolic (Bishop) on Sep 16, 2009 at 14:11 UTC
I appreciate the feedback ++ To get rid of the unix limitation you could die without $ENV{HOME}, or use File::HomeDir. You are correct: the reason for my self-imposed unix limitation is that I was unaware of how to handle `$ENV{HOME}` in a portable way. Thanks for bringing the File::HomeDir module to my attention. For my purposes, I have come to realize that it is important to only use core modules in this script. The original version of the script used the non-core List::MoreUtils. I ran into problems on one system configuration here @work which, unbelievably, did not have it installed. So I could not even analyze what modules were installed because my script died because it could not use a module! I will take a look at the File::HomeDir source code to see if I can incorporate its techniques for making findpm portable. Whoops, I just realized pminst is broken in 2 ways I am also aware of 2 bugs in pminst: It completely misses some modules. It unnecessarily duplicates some modules in its output. I believe this is the same as the `MSWin32-x86-multi-thread` issue you mentioned. It does not seem to handle all of the `@INC` paths gracefully. At first, I was willing to concede that my sysadmins set `@INC` in an unconventional manner... until you mentioned that it was also an issue for your system. I should file a bug report on CPAN. Unfortunately, it is not obvious to me how to patch the code. I guess this is the reason I created the findpm script in the first place. Update: Someone has reported a bug: https://rt.cpan.org/Public/Bug/Display.html?id=50644	[reply] [d/l] [select]
Re: Find installed Perl modules matching a regular expression by toolic (Bishop) on Sep 22, 2009 at 20:55 UTC
I have updated the code to be more portable to other operating systems. The restriction now is that the initialization file is only supported for Unix-type operating systems. But, even that could be fixed by changing a single line in the source code.	[reply]
Re^2: Find installed Perl modules matching a regular expression by Anonymous Monk on Sep 25, 2009 at 21:58 UTC
Here is another idea, I would replace `if (-M $init_file > $days) { $message = "Warning: $init_file is older than $days day\n"` [download] with a check to see if $init_file is more recently modified than a directory in @INC, like `for my $init_file ( '.', '..' ) { my $mod = ( stat $init_file )[9]; if ( my @mod = grep { ( stat $_ )[9] > $mod } @INC ) { warn "Warning: $init_file is older than (", join( ' , ', @mod +), ") "; } } __END__ Warning: .. is older than (C:/perl/5.10.1/lib/MSWin32-x86-multi-thread + , C:/perl/5.10.1/lib , C:/perl/site/5.10.1/lib/MSWin32-x86-multi-thr +ead , C:/perl/site/5.10.1/lib , .) at - line 4.` [download]	[reply] [d/l] [select]
Re^3: Find installed Perl modules matching a regular expression by toolic (Bishop) on Sep 26, 2009 at 00:49 UTC
Nice idea. However, it does not seem to work if a sub-directory of a directory in @INC has been modified. Since directories under @INC can be arbitrarily deep, it would be necessary to perform a find on all directories, which is what the init file was designed to avoid. Perhaps there is a more efficient way to check if any directory has been modified throughout a tree.	[reply]
Re^4: Find installed Perl modules matching a regular expression by Anonymous Monk on Sep 26, 2009 at 01:08 UTC
Re^5: Find installed Perl modules matching a regular expression by toolic (Bishop) on Sep 27, 2009 at 00:17 UTC


Pathologically Eclectic Rubbish Lister
	PerlMonks