nimdokk has asked for the wisdom of the Perl Monks concerning the following question:

I'm working on a script that will monitor a set of directories for "stale" files, files that have been waiting to process for more than a specified time (typically about an hour). The way I have it currently set up, there is a script and a config file that contains the list of directories to be scanned. This file would get re-read each time the script runs (it would run every 15 minutess) so that if new directories are added, they can be added to the file without having to restart the program (ideally - there are still a few other tweaks and things I want to work out before I ask them admins to put this into production on one of our servers).

That's the background. I'm wondering if it is better to keep the list of directories in a separate file or to keep it in the script under __DATA__ and read that in each time the script scans the directories. Are there any advantages that anyone can see? The only one I can see at this point would be that everything is contained in one file. On the other hand, keeping the list in a separate file means that someone (not me) would not have to dig through the program to add a new directory (or take one away). Any other thoughts pro or con to keeping the list in a __DATA__ versus using a separate config file?

Replies are listed 'Best First'.
Re: DATA versus config file
by matija (Priest) on May 18, 2004 at 13:23 UTC
    Your thoughts are valid. Just one point on the side of an external file: with an external file you could have a list of files (or directories) that have to be checked every hour, and another list of files that have to be checked every 15 minutes - and they would both be used by the same program. If you put the data into the DATA you can only use the program for one purpose. It goes against the concept of small, reusable tools.
      Thats a good point. Right now it is configured to use only one config file, but it wouldn't be too difficult to add in some added functionality that could account for other config files based on time or something like that. I'm just trying to keep this reasonably simple but flexible enough to be changed in the future as unforeseen needs come up.
Re: DATA versus config file
by muntfish (Chaplain) on May 18, 2004 at 13:37 UTC

    If you want any level of maintainability or reusability - go with a config file.

    For example if your script will end up running in more than one environment, you'll probably want to scan a different set of directories in each environment. And you don't want to be maintaining multiple versions of the script for that.

    Also, if you're working on a team with multiple developers, and use a version control product, separating into 2 parts will improve the granularity.

Re: DATA versus config file
by Solo (Deacon) on May 18, 2004 at 13:41 UTC
    There are a lot of config-handling modules out there. It's not that much more work to use them, and the benefits include:

    • the last modified time of the perl source will acurately show the last time you mucked with it's inner workings
    • some of the modules can automatically re-read the file if you ever daemonize the thing

    --Solo
    --
    You said you wanted to be around when I made a mistake; well, this could be it, sweetheart.
Re: DATA versus config file
by davido (Cardinal) on May 19, 2004 at 06:24 UTC
    I have occasionally used __DATA__ to hold defaults that will be used in the absence of a config file. It worked out great.

    Another strategy is to detect the existance of the config file, and if it doesn't exist, use the contents of __DATA__ to write your default config file. That could be used to create the basic config file in such a way that it is easy to text-edit (and thus to modify) by the person maintaining your scripts installation. The following steps would be taken:

    • Check for config file.
    • If file exists, open and use it.
    • If file doesn't exist, read __DATA__ and write out a config file.
    • Use the newly written config file.

    This will leave the config file as an artifact of the script's first run. The user can then customize it without having to sift through your source code, possibly ruining something important. ;)


    Dave

Re: DATA versus config file
by stvn (Monsignor) on May 18, 2004 at 14:02 UTC

    muntfish and Solo both make excellent points. To add to what Solo said re: daemonizing your script. It is important to note that once the DATA handle is read, it cannot (easily) be re-read (seeking to the beginging of the file puts you at the top of your source, not the top of the DATA handle). So if you were to convert it to a daemon, you would need to watch out for this. But then if you daemonize it, you would be better off with an external config anyway, that way it your config can change without restarting your daemon.

    -stvn
      It is important to note that once the DATA handle is read, it cannot (easily) be re-read (seeking to the beginging of the file puts you at the top of your source, not the top of the DATA handle).

      Erm, no it's very easy to do this.

      #!/usr/bin/perl use Fcntl qw( :seek ); my $top_o_data = tell( DATA ); for( 1 .. 2 ) { print "read $_\n---\n"; seek( DATA, $top_o_data, SEEK_SET ); print while <DATA>; } __DATA__ Wubba. Zoikes. Jinkies.

      Update: The problem of course is that you may not see updates (at least not on OS X or Linux that I've checked).

Re: DATA versus config file
by zude (Scribe) on May 18, 2004 at 17:23 UTC
    How about both. If config file exists, use it. Otherwise, get defaults from __DATA__. You could avoid dynamic reload issues by just running from cron every */15.

    +++++++++ In theory, theory describes reality, but in reality it doesn't.

Re: DATA versus config file
by nimdokk (Vicar) on May 20, 2004 at 15:30 UTC
    Thanks everyone for the comments on this, they've been helpful. After looking at the comments and further talking with my co-worker, we are going to use a config file but I'm going to add in functionality so that some directories can be scanned and monitored for files with shorter intervals than 1 hour.