Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

use lib "."

by Bod (Parson)
on Aug 12, 2023 at 11:49 UTC ( [id://11153844]=perlquestion: print w/replies, xml ) Need Help??

Bod has asked for the wisdom of the Perl Monks concerning the following question:

I've come across some unexpected behaviour that I thought was obvious but it seems not so your input would be welcome...

On a webserver I have a maintenance script that runs off CRON every morning. The filesystem looks like this:

/home/username/website/prod/lib <- modules /home/username/website/prod/template <- templates /home/username/website/prod/www <- HTTP root
There is also a test environment as well as prod
To set @INC correctly, this line is at the top of every script that needs to run in CGI context
use lib "$ENV{'DOCUMENT_ROOT'}/../lib";
But, when run by CRON, the maintenance script doesn't get passed $ENV{'DOCUMENT_ROOT'}.

However, the maintenance script is located at:

/home/username/website/prod/lib/maintain.pl
It needs to
use Site::Utils;
where the module is located at
/home/username/website/prod/lib/Site/Utils.pm

I thought I could simply use lib "."; to add the current directory to @INC or even leave it out altogether and Perl would find the module by searching the filepath relative to the running script.

But both these approaches result in Perl reporting that it cannot find the module...

What am I overlooking here?

Replies are listed 'Best First'.
Re: use lib "."
by philipbailey (Curate) on Aug 12, 2023 at 15:51 UTC

    I'm a bit surprised that nobody has so far mentioned FindBin, a core module, which is intended to solve your problem.

    For your case, you can simply do:

    use FindBin; use lib $FindBin::Bin;

    The nice thing about FindBin is that as long as the directory structure of your code is constant, you can run scripts from wherever the directory tree exists. So it will work just as well from your local Git checkout, on the Jenkins server or the production location, which may well all be different.

    A more typical directory structure would have a bin directory for scripts and lib for modules, so in that case the code would instead look like:

    use FindBin; use lib "$FindBin::Bin/../lib";

        Yes, FindBin has some global behaviours that don't work well in persistent environments where it could be use'd multiple times. But for simple cases like production or test scripts, there's no problem.

      This is the way, except that you should use $RealBin instead of $Bin. $Bin is buggy since it breaks when a symlink to the executable is used.

      In this case:

      use FindBin qw( $RealBin ); use lib $RealBin;

      When you have bin/ and lib/:

      use FindBin qw( $RealBin ); use lib "$RealBin/../lib";
      I'm a bit surprised that nobody has so far mentioned FindBin, a core module, which is intended to solve your problem.

      ** slaps self round the face **

      Yes - I knew that I was missing something obvious and FindBin is that thing!!!

      I was aware of it but had completely forgotten about this simple solution!

      I've actually solved the issue by feeding lib an absolute path on the basis that the script cannot move without updating cron which has an absolute path. Given that the modules won't move anywhere else, this is probably a reasonable solution.

Re: use lib "."
by marto (Cardinal) on Aug 12, 2023 at 12:01 UTC

    cron has a virtually empty environment. Environment variables can be set in crontab or kept in a separate file and imported (via env), or you could just not use an environment variable, and programmatically determine that path within your code.

      Correct. It also doesn't run the .bashrc and stuff. For this reason, i usually have cron call a bash script that THEN starts the perl script in question. Excerpt from my user crontab:

      20 1 * * * /bin/bash /home/cavac/src/pagecamel_cavac/devscripts/geoip/updategeoip.sh >> /home/cavac/src/pagecamel_cavac/devscripts/geoip/crontab.log 2>&1 &

      Note that this specifies the FULL path to the script. It also rereoute all STDOUT and STDERR into a logfile. Here's the script:

      #!/usr/bin/env bash . ~/.bashrc_cavac cd /home/cavac/src/pagecamel_cavac/devscripts/geoip date perl updategeoip.pl

      What this does is source the bashrc script that sets all the proper environments, change directory to the one the perl script expects, runs date (so the logfile contains a line with the timestamp of the run) and then runs the perl code.

      Just for reference, here's the bashrc-script. I removed parts that are not relevant for this answer; it actually does a lot more, depending on which system and for which user it's running:

      # CavacPerl export PATH=/home/cavac/bin/CavacPerl-5.36.1/bin:$PATH export MANPATH=/home/cavac/bin/CavacPerl-5.36.1/man:$MANPATH # Scripts export PATH=/home/cavac/bin/scripts:$PATH toilet -f smblock --filter metal:border virgo.cavac.at

      (The "toilet" command runs on every computer/account of mine that allows ssh login, to show a banner of the system i logged in. Mostly to prevent me debugging on the wrong machine and wondering why it doesn't work.)

      As a side note, running bash scripts to run your perl code also allows you to make sure you are running the most recent version of the script. On some of my raspberries is just run the appropriate mercurial SCM commands before running the perl script. Another win for total laziness, now i don't even have to update those ancient things manually...

      PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP
Re: use lib "."
by hippo (Bishop) on Aug 12, 2023 at 15:16 UTC
    What am I overlooking here?

    You are overlooking what the current directory is when running the cron script. Set up a trivial cron job which just prints the current working directory and then you will know. It's usually $HOME unless you change it.

    There are, as usual, so many ways to tackle this. My usual approach is to have cron run a shell script which does all the set-up (change to the right directory, populate the environment etc.) and then at the last have that script exec the perl script. I've found this to be robust and secure (in that it doesn't leak any info). You can have the same shell script (or 2 copies of the same script) for both test and prod and just pass it a different argument to choose which path to run. eg:

    #!/bin/bash if [ "$1" == 'prod' ]; then cd $HOME/website/prod else cd $HOME/website/test fi export DOCUMENT_ROOT=./www exec ./lib/maintain.pl

    And in your crontab call it like

    30 5 * * * $HOME/website/prod/maintain.sh prod

    I do find it slightly odd that you have scripts in your lib dir. I tend to keep all my cron jobs in a cron dir for clarity.


    🦛

      I do find it slightly odd that you have scripts in your lib dir. I tend to keep all my cron jobs in a cron dir for clarity.

      On sites where I have more than one cron script I keep them in a scripts directory.

      In this case, all the modules have their own directories and this is the only script outside of the www directory. So it is the only thing in the lib directory other than more directories.

      Would you still use a cron directory even if you can be pretty sure there will only ever be this one script?

      You can have the same shell script (or 2 copies of the same script) for both test and prod

      No need - only production has the cron maintenance script...

        Would you still use a cron directory even if you can be pretty sure there will only ever be this one script?

        I would, yes. The only things I expect to see in the lib tree should be modules or POD files.


        🦛

Re: use lib "."
by kcott (Archbishop) on Aug 12, 2023 at 15:43 UTC

    G'day Bod,

    Adding "." to @INC has security implications. It used to be a default but was removed in Perl v5.26.0. There's a long discussion in "perl5260delta: Removal of the current directory (".") from @INC" which I recommend you read.

    As already pointed out, cron provides a very limited environment. This not only includes the environment variables available, but also the content of those variables: compare the value of $ENV{PATH} from the command line and from a cron script.

    To avoid problems, I typically use absolute paths everywhere in a cron script. You may find code like this in my cron scripts:

    ... PERL=/usr/bin/perl ... $PERL /full/path/to/some_script.pl ... $PERL /other/full/path/to/other_script.pl ...

    A lib directory is fairly standard for Perl modules; it is not standard for scripts. A bin directory is more usual for *.sh, *.pl, and so on. I'll leave you to decide if you wish to make a change.

    I knocked up a skeleton directory structure to mirror what you presented. My /home/ken/tmp/pm_11153844_cron_paths/ is intended as an equivalent to your /home/username/website/prod/.

    ken@titan ~/tmp/pm_11153844_cron_paths $ pwd /home/ken/tmp/pm_11153844_cron_paths ken@titan ~/tmp/pm_11153844_cron_paths $ ls -lR * lib: total 1 -rw-r--r-- 1 ken None 633 Aug 13 00:38 maintain.pl drwxr-xr-x 1 ken None 0 Aug 13 00:36 Site lib/Site: total 1 -rw-r--r-- 1 ken None 78 Aug 13 00:36 Utils.pm template: total 0 www: total 0

    An extremely minimal Site::Utils module looks like this:

    ken@titan ~ $ cat /home/ken/tmp/pm_11153844_cron_paths/lib/Site/Utils.pm package Site::Utils; use strict; use warnings; our $VERSION = '1.2.3'; 1;

    Here's a maintain.pl which shows how to determine various directories and tests them:

    ken@titan ~ $ cat /home/ken/tmp/pm_11153844_cron_paths/lib/maintain.pl # standard pragmata use strict; use warnings; # code to get absolute paths use Cwd 'abs_path'; use File::Basename 'dirname'; my ($BIN_DIR, $LIB_DIR, $TEMPLATE_DIR, $WWW_DIR); BEGIN { $BIN_DIR = dirname abs_path __FILE__; $LIB_DIR = abs_path "$BIN_DIR/../lib"; $TEMPLATE_DIR = abs_path "$BIN_DIR/../template"; $WWW_DIR = abs_path "$BIN_DIR/../www"; } # test paths print "\$BIN_DIR[$BIN_DIR]\n"; print "\$LIB_DIR[$LIB_DIR]\n"; print "\$TEMPLATE_DIR[$TEMPLATE_DIR]\n"; print "\$WWW_DIR[$WWW_DIR]\n"; # test 'Site::Utils' found use lib $LIB_DIR; use Site::Utils; print "Site::Utils version: $Site::Utils::VERSION\n";

    Here's that script run from my home directory:

    ken@titan ~ $ pwd /home/ken ken@titan ~ $ /usr/bin/perl /home/ken/tmp/pm_11153844_cron_paths/lib/maintain.pl $BIN_DIR[/home/ken/tmp/pm_11153844_cron_paths/lib] $LIB_DIR[/home/ken/tmp/pm_11153844_cron_paths/lib] $TEMPLATE_DIR[/home/ken/tmp/pm_11153844_cron_paths/template] $WWW_DIR[/home/ken/tmp/pm_11153844_cron_paths/www] Site::Utils version: 1.2.3

    If you do decide to use a bin directory, that script should still work as is. The first line of output would be:

    $BIN_DIR[/home/ken/tmp/pm_11153844_cron_paths/bin]

    The other output should be unchanged.

    — Ken

      As already pointed out, cron provides a very limited environment

      That explains the odd behaviour I discovered after posting the question...
      It worked as expected when executed over SSH where the CGI environment variables are still missing.

      To avoid problems, I typically use absolute paths everywhere in a cron script.

      That's how I've solved it - at least I think I've solved it. I won't know for sure until the cron job runs tomorrow morning. Now the maintenance script has an absolute path passed to lib so there are no relative issues to deal with.

      I'm not sure whether this is a better approach than using FindBin?

      A lib directory is fairly standard for Perl modules; it is not standard for scripts. A bin directory is more usual for *.sh, *.pl, and so on. I'll leave you to decide if you wish to make a change.

      See my reply to hippo at Re^2: use lib "."

      I'd be interested in your view of having a bin or cron directory when there is only a single script outside the www directory...

        "I'd be interested in your view of having a bin or cron directory when there is only a single script outside the www directory..."

        I'd pick a standard location for scripts (bin, cron, scripts, whatever) and use it consistently. I wouldn't move them around based on how many there were at any given time.

        — Ken

      Adding "." to @INC has security implications. It used to be a default but was removed in Perl v5.26.0.

      Thanks for the information. I wasn't aware of the change...I thought it was still there for all versions of Perl. I am using v5.16.3 in this environment so "." is still included as is shown from perl -V

      @INC: /home/account/perl5/lib/perl5/5.16.3/x86_64-linux-thread-multi /home/account/perl5/lib/perl5/5.16.3 /home/account/perl5/lib/perl5/x86_64-linux-thread-multi /home/account/perl5/lib/perl5 /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .
      But this could potentially bite me locally where I have v5.32.1. The difference is definitely there...
      @INC: C:/Strawberry/perl/site/lib/MSWin32-x64-multi-thread C:/Strawberry/perl/site/lib C:/Strawberry/perl/vendor/lib C:/Strawberry/perl/lib

Re: use lib "."
by duelafn (Parson) on Aug 12, 2023 at 12:35 UTC

    use lib '.'; will use the current working directory (possibly /) and not the script path. I'd suggest setting DOCUMENT_ROOT in your crontab.

    Good Day,
        Dean

      DOCUMENT_ROOT in crontab has a huge astonishment potential. Better just set PERL5LIB there if relying on environment variables is completely inevitable...
Re: use lib "."
by Dallaylaen (Chaplain) on Aug 14, 2023 at 09:59 UTC

    I have just recently discovered lib::relative and it seems to do exactly what you want. Or you can go with $FindBin as other reply suggests.

      Thanks...lib::relative seems to be a useful module for some circumstances.

      In this instance I'm sticking with an absolute path for lib.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11153844]
Approved by marto
Front-paged by kcott
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2024-04-20 12:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found