Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

How do I detect what modules are not being used?

by FatDog (Beadle)
on Apr 07, 2005 at 00:43 UTC ( [id://445496]=perlquestion: print w/replies, xml ) Need Help??

FatDog has asked for the wisdom of the Perl Monks concerning the following question:

I have inherited about 900 perl and shell scripts. A lot of code was cut/pasted over the last 5 years and my boss has asked me to try and clean things up.

One issue is this:

use Getopt::Long; use Net::SSH qw(sshopen2); use Net::Ping; use Net::FTP; ...
This script DOES use sshopen2, but not Ping or FTP.

What could I do (from another perl script scanning through *.pl files) to report that this file includes but does not use Ping or FTP?

I have tried using >perl -MO=Xref,-d test.pl to generate a cross-reference list of variables, but this does not identify packages that are included, but not used.

I thought about taking test.pl and creating copies like "a.pl", "b.pl", "c.pl", "d.pl".. with the difference:

a.pl - comments out the "use Getopt::Long" line

b.pl - comments out the "use Net::SSH qw(sshopen2)" line ..

Then doing a perl -c on each copy to see if I could identify useless "use" statements if things tested correctly. But this does not catch things.

I tried using

perl -MO=Deparse test.pl
Hoping that this would expand:

&sshopen2() into

&NET::SSH::sshopen2()

And I could grep the expanded output to find the packages that ARE used and do my own bookeeping, but Deparse does not fully-qualify function calls.

So I am kind of stumpped. Any thoughts?

Replies are listed 'Best First'.
Re: How do I detect what modules are not being used?
by Fletch (Bishop) on Apr 07, 2005 at 02:49 UTC

    I'd be very wary about doing this completely automatically. There's nothing stopping someone from calling an arbitrary method $somePkg->$someMethod( "blah" ), with either of those variables set at runtime. You'll need to be looking very closely at all of the code, otherwise can't be sure that there's not some code path that's setting $somePkg to something you've removed since you never saw it called.

    Update: And a similar caveat for plain subroutine calls if there isn't a use strict; there could be &{"${somePkg}::$sub"}() calls to who knows what. Granted this probably unlikely (I wouldn't write code that way), but be aware of the possibility.

    Update: You should probably get your boss to spring for a copy of Refactoring (ISBN 0201485672) as well.

Re: How do I detect what modules are not being used?
by blahblahblah (Priest) on Apr 07, 2005 at 01:14 UTC
    How about writing a small script that starts like this:
    package moduleTest; eval "$ARGV[0]";
    (The argument would be the "use Whatever qw(umm)" line.) Then the script could go on to check the package's symbol table for things that have been exported by the module, and print the results.

    I know this is terribly inelegant, but it seems like a simple way to get what you want.

    -Joe

      blahblahblah,
      know this is terribly inelegant, but it seems like a simple way to get what you want

      I have been staring at the original question and the answers in this thread and am failing to see how most of them are relevant. The question is not what modules are being imported, the question is of the many modules being imported, which ones are really being used. This question is presumably being asked so the ones not being used can be removed (cleaned up per the boss).

      Cheers - L~R

        I should have been more clear, but I was really only replying to the last part of the original post:

        ...
        &NET::SSH::sshopen2()

        And I could grep the expanded output to find the packages that ARE used and do my own bookeeping, but Deparse does not fully-qualify function calls.

        The problem here was that you can easily grep your own code for Foo::x() to find uses of Foo's x method, but if your code just calls x(), you need to know that x is a method of Foo. If you can find out what all the methods of Foo are, then you can grep for non-fully-qualified calls to those methods in your code and flag those parts of your code for review.

        -Joe

        Update: I just read some more of the replies, and of course I agree that there's no way to automate this task and get it 100% right. But I think that a simple solution like this could save some effort by categorizing each script/module as "definitely used" or "probably not used". Then you'd want to manually review the "probably not used" cases.

Re: How do I detect what modules are not being used?
by Anonymous Monk on Apr 07, 2005 at 08:48 UTC
    Unless you can solve the halting problem, you cannot write a program that determines whether a used module is used or not. Note also that (directly) calling subs or methods in the module isn't a good test - modules like Exporter, strict or Memoize would then be classified as "not in use", even if they are.

    If you have a good test suite, one approach you could take is to outcomment a use Module; of which you think is not in use, and run the test suite. But that only works if your test suite covers all paths. And, due to the way the modules work, outcommenting use strict; and use Memoize; won't show any failures.

Re: How do I detect what modules are not being used?
by cheshirecat (Sexton) on Apr 07, 2005 at 13:15 UTC
    How about running the scripts under the profiler

    perl -d:DProf mycode.pl

    To find out which modules/subroutines are actually called/used

    I know that this may not catch all the edge cases but it might be worth a try ?

    Cheers

    The Cat

      Hi,

      You could maybe just mess with @INC or programmtically rename each module directory and then perl -c each script and look for the "Undefined subroutine &main:: at line" errors

        hakkr,
        This is similar to the approach I outlined (commenting out each module). This is certainly better than nothing, but there are still plenty of edge cases that would make this less than a 100% solution. For instance, if a module is required inside an eval block of a conditional, then only under the right circumstances will making the module unavailable show up. A comprehensive test suite is needed to ensure removing the module will have no ill effect.

        Cheers - L~R

Re: How do I detect what modules are not being used?
by Limbic~Region (Chancellor) on Apr 07, 2005 at 13:11 UTC
    FatDog,
    As some of the replies indicate, this is a terribly hard problem to solve with an automatic one-size-fits-all approach. Since you are doing cleanup, it is unlikely that the code has an extensive test suite. If that were the case, it would be possible through trial and error to comment out each module and see if the comprehensive test suite still passed.

    Depending on the size and the complexity of the code and the amount of time your boss is willing to let you spend refactoring, you might just want to start from scratch with each one in turn. Examine the code of the prior code, write tests, write code, check tests, bugfix as necessary, document - wash-rinse-repeat.

    Cheers - L~R

Re: How do I detect what modules are not being used?
by adamk (Chaplain) on Apr 07, 2005 at 07:50 UTC
    To start with, I'd grep the entire file for /\buse\s+([\w:]+)\b/ to find all the modules being used (at least via 'use' in any case).

    Then for each module, search for /(?<!use\s+)$module\b/ to find all the OTHER uses of it in the file.

    If the name of the module never appears anywhere else in the file, then it's almost certainly not used. :) Adam K
      adamk,
      If the name of the module never appears anywhere else in the file, then it's almost certainly not used

      I assume you are thinking that if I do:

      use CGI; my $q = CGI->new();
      Then you will be able to tell if I have actually used that module? A huge portion of modules are not OO and export functions that do not need the module name to be invoked. Additionally, the module may be exporting constants.

      The best this method can do is attempt to confirm actual use. Even then it is subject to break since the module name may appear in comments or code as false positive. Even reducing the comments and POD with perl -MO=Deparse will not be fool proof. Your method will not be a good indicator of modules not actually being used and will be a poor indicator of modules that are being used.

      Cheers - L~R

        I certainly don't consider it a complete method for handling all cases, but modification of large groups of modules tends to go like this. Run some wide scans, fix the obvious cases, and then work your way down the curve of diminishing returns.

        A minor change to check for the use statement having params, or the module being called having an @EXPORT (compulsory export) would largely resolve your issues.

        The poster is not going to automagically modify the files, he's just looking for clues. If a report can identify the 25-75% of obvious cases, then it's a good high yield starting point.
Re: How do I detect what modules are not being used?
by redlemon (Hermit) on Apr 07, 2005 at 11:47 UTC

    This is a relatively naive solution, but one I'd probably give a try.

    I don't know if you're running a UNIX variant, but an option would be to set the atime of all modules to a known value, run the application stack and then check what atime have changed.

    You'd have to know of course nothing else is touching the modules, like backups, slocate, finds, etc.

      redlemon,
      See my earlier reply. This question is not about which modules are being imported, it is about which of those that are being imported are actually being used. This method will not solve that problem.

      Cheers - L~R

        Oh dear. You're right. That's going to make it a bit harder.

        I wonder if B::Xref would be of help. Although it would only give you an intermediate output format (namely, what is defined where and what is used where), that may be parseable into something more useful.

Thank you for the thoughts
by FatDog (Beadle) on Apr 07, 2005 at 16:24 UTC
    I am grateful that you guys did not come up with a easy solution. This means I am not over-looking something obvious (a problem shared...)

    Because these scripts reach-out and touch production files/databases, a solution that requires running the script is not a happy one.

    The ultimate goal would be a "lint" type of check for each perl script that can be run when the script is submitted for production release. There is also the problem of having .. 5 different Linux servers without identical perl modules installed. My boss that wants the ability to know if a script working on machine A can be moved to machine C. This means identifying the real required modules and perhaps documenting them, but not trusting the "use" statements at the top of each script.

    I hoped I could comment out all "use xxxx" statements and run "perl -c" or "perl -Mdiagnostics" and get a list of all the un-resolved function calls. This would at least get me started.

    But.. un-resolved function calls generate run-time errors. (grrr). Any thoughts?

      Take a look at PAR, they are solving a similar problem (OTOH, when you say 'use STH' they just assume it's because you need it), but still, they had a problem of figuring out which modules a given script might need, but still the problem is similar.

      OTOH, PAR might solve your problem of ensuring that given script will run on another machine - you can either pack your script, together with all it's requirements into .par archive, or even pack it into elf executable ( but that tends to create 2-5 megabyte executables ).

Re: How do I detect what modules are not being used?
by gam3 (Curate) on Apr 07, 2005 at 15:26 UTC
    Here is some code that you can run that will certainly help you decide if a module is being used.

    There are many bugs in this code. You will notice if you run it on itself: it does not think that Data::Dumper is being used. This is because Dumper is not being explicitly imported. And if the code has "C"."G"."I"->new() in it this program will not find it.

    I assume that for most of the programs that you have this little program will say that all of the modules are being used, and then you can spend your time looking at the programs that have modules that may not be being used.

    False positives are a much bigger problem, but including a module that is not used is not horrible.
    -- gam3
    A picture is worth a thousand words, but takes 200K.
Re: How do I detect what modules are not being used?
by gam3 (Curate) on Apr 07, 2005 at 15:29 UTC
    Or it might be impossible.

    -- gam3
    A picture is worth a thousand words, but takes 200K.
Re: How do I detect what modules are not being used?
by exussum0 (Vicar) on Apr 07, 2005 at 18:22 UTC
    The best you will ever do is estimate. You can't tell if a particular line will ever run until you run that program. You can do some heuristics, but then you'll miss the hidden ones, where you "use" modules while the program is running. It's reducable to the halting problem - will a program stop running.

    ----
    Give me strength for today.. I will not talk it away..
    Just for a moment.. It will burn through the clouds.. and shine down on me.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://445496]
Approved by Old_Gray_Bear
Front-paged by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2024-03-29 15:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found