http://qs1969.pair.com?node_id=11152665

misterperl has asked for the wisdom of the Perl Monks concerning the following question:

We have several thousand pm, pl, or cgi files in one mega directory. Pareto suggests that probably only twenty percent or so are used, or get frequent use. But we don't know. It would be useful to know. My idea is to have a code repository, and an active code dir. If a file is needed, it can be pulled from the repository.

A smaller set of files would be much easier to manage in the active dir. Especially for newbies.

I'm thinking of an SQL table with one row per file access, and the access type like edited the file, ran the file, "required" or "used" the file, chmodded, etc. Maybe once a day, code could run that reads access info for each file, and moves them to or from the active code directory depending on when they were last used. Or maybe based on frequency of use, rather than last use.

So I'm wondering are there CPAN modules to assist with this? I can have the repository as part of @INC, and as files are used, a mysql row is added. Starting with 100% in the repository, after weeks of running, source files we need would be in active. The rest would be inactive and maybe even after a long term, removed. And since it sounds like this table has a potential to get large, we might have to cull old rows. A thousand files invoked a thousand times a month could be a BIG table!

I'm thinking modules that I could add to the top of every sourcefile, that signals a write to the access table. And maybe modules that help me tell what classes (pms) are actually invoked or resourced, rather than just "used" , with no real use. Or, a CPAN module that can generate an array of all use or required files in the hierarchy under the initial pl or cgi.

I'm pretty sure I can write all the functionality from scratch, but if CPAN modules exist that facilitate this function, I'm all-in. I guess in-summary I want like a meta library that tells me a bunch of things about what is running, what it's using, where it came from, maybe how many milliseconds it ran, what user ran it, etc etc.. I see potential concurrency issues with a running program that has one of it's "pm's" moved mid-run; not sure how to deal with that since we operate 24x7. And other concerns. I'm still in the dreaming stage..

Best Regards Monks!

Replies are listed 'Best First'.
Re: Are there CPAN modules that can help write realtime software catalogs
by Corion (Patriarch) on Jun 06, 2023 at 20:58 UTC

    I would put a subroutine into @INC that simply logs all the filenames/module names that get loaded, likely into a text file named after $0 (the currently running program). I'd use a text file to avoid any hassle with the database, just for the case when it is unavailable.

    If you can set up $ENV{PERL5OPT} for every user, you can install the hook for every user:

    package LogLoadedModules; use strict; sub log_loaded_module { my( $self, $file_to_load ) = @_; write_to_log_file( $0, $file_to_load ); return; # we didn't find anything } # Automatically install the hook in @INC when this # module is loaded sub import { unshift @INC, \&log_loaded_module; } 1;

    Then you can set up @INC for every user by having

    export PERL5OPT=-MLogLoadedModules

    Update: Fixed unshift line, as corrected by Perlbotics

Re: Are there CPAN modules that can help write realtime software catalogs
by cavac (Parson) on Jun 13, 2023 at 08:32 UTC

    Beware of the dangers of optimizing/removing infrequently run code! Believe me, that can be the most critical stuff and gets hardly ever tested.

    A prime example is code that run end-of-year to, let's say, restart invoice numbering with the new years prefix. E.g. switching from 2209543 to 2300001. This particular code runs once a year for *maybe* a tenth of a second. But if it's missing or broken, bad financial things can happen.

    That specific example is rather fresh in my mind. Guess what i spend the first week of January fixing while shuffling around on my knees begging the customer for forgiveness.

    PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP
Re: Are there CPAN modules that can help write realtime software catalogs
by stevieb (Canon) on Jun 07, 2023 at 05:29 UTC

    I had a similar problem years ago, and I solved it by writing an injection routine into Devel::Examine::Subs.

    Effectively, it uses PPI to inspect Perl files to find out all manner of information from them. The addition allowed me to inject code at certain points in the code for tracing flow.

    This software can inject a routine to write usage statistics to a DB in every single Perl file you have. It will be consistent and reliable. With one command, you can inject as few or as many lines of code to all files across your platform.

    I use it for new clients to baseline which files (and down to the subroutine) are actually used, and how often.

    Feel free to contact me privately to discuss specific details, then if you go ahead, we can post the details here.

      i agree that logging subs usage makes sense. After all use'ing a module/file does not mean it is needed. Is n't a profiler useful? I am away and can't test.

      bw, bliako

      node ownership assigned to bliako by erzuuli

      yes, thanks erzuuli

Re: Are there CPAN modules that can help write realtime software catalogs
by Anonymous Monk on Jun 07, 2023 at 21:03 UTC
    Don't forget the Unix atime! Most people turn it off to avoid the performance hit, but I'm sure it will be less performance hit than custom perl code to log these things into a database. Just remount your filesystems with atimes fully enabled and then later run a "find" command looking for all files with atimes greater or less than a date of interest.
Re: Are there CPAN modules that can help write realtime software catalogs
by Anonymous Monk on Jun 09, 2023 at 05:45 UTC

    There should be a way to do this at the filesystem level. 'inotifywait -e open' on linux for example shows it can be done. If you look in to inotify I think all you have to do is notify for any file open event on the directory and it will fire every single time anything reads the file - including any perl interpreter.