punkish has asked for the wisdom of the Perl Monks concerning the following question:

In my continuing quest for light (see Automating data loading into Oracle), I am trying to figure out how to layout my program. I have to load several different datasets into a database, they have to be loaded periodically, and the periodicity of the dataloads for the different data are different. I am envisioning a physical layout like so --

app_root/ app_conf/ app_lib/ app_docs/ all.pl* data/ sales/ conf/ lib/ docs/ sales.pl* marketing/ conf/ lib/ docs/ marketing.pl*

Wrt running my program, I have the following choices --

Any suggestions on the above or anything related are welcome.
--

when small people start casting long shadows, it is time to go to bed

Replies are listed 'Best First'.
Re: designing layout of a program
by gaal (Parson) on Mar 29, 2006 at 18:41 UTC
    Oops, somehow missed the scheduling question.

    Scheduling is an operating system service, that is, it handles certain things so that you don't have to. That's usally a good thing :-) Again it takes out something that is essentially data (periodicity of your run) and wraps it up so that no coding is needed to change that aspect of your system's behavior. That is a good thing.

    Another advantage of having the OS do scheduling for you is that if one of your runs failed due to a transient error, you don't need to write a watchdog to restart the process.

    A possible disadvantage of system scheduling is that the admins might not want to use it: perhaps it makes deployment more difficult if you don't want to invest in the installer doing it automatically, or maybe they need to edit a config file anyway and want THAT parameter to be in the same place as the other parameters.

    Another possible aspect is permissions: who is allowed to change the period of runs? Is this something to worry about?

Re: designing layout of a program
by gaal (Parson) on Mar 29, 2006 at 18:32 UTC
    How different are the "sales" and "marketing" loads in programming logic, not in data? How much human interaction do you expect?

    Move as much as you can out to configuration. It makes no sense at all to maintain scripts that do precisely the same thing except that each hardcode a different set of constants. If your users cannot possibly stomach

    load.pl --dataset marketing load.pl --dataset sales

    Then write load.pl that way anyway, but just provide them with two wrapper scripts (or aliases, or desktop shortcuts, whatever).

    If logic does differ, but there is some untrivial overlap, factor out the common behavior to a module. See perlmod and, for example, Module::Starter if you aren't sure how to do that.

      If logic does differ, but there is some untrivial overlap, factor out the common behavior to a module.
      Logic does differ, and there is a lot of non-trivial overlap which will be factored out into common modules moved up the ladder to app_lib. "sales" and "marketing" are just examples -- I will eventually have more than a dozen datasets. So, I could use load.pl --dataset <datasetname> or I could use <datasetname>.pl. Six of one, squareroot of 36 of the other. Or is it? Any gotchas that I need to plan for? Especially when using the OS scheduler (currently, my preference).
      --

      when small people start casting long shadows, it is time to go to bed
        If they perform conceptually the same task, I like to have them in one executable. Because it's cheaper to refactor things that way, the code tends to end up closer to where it should. Can't say more without knowing more about your application domain.

        How comfortable are you with your OS scheduler, and how refined are the scheduling requirements? I know how to do certain things with unix cron, but not others. If for example the exact timing of the next task depends on something that happened in the current one, I'd avoid the OS scheduler (although I also know how to use it if I wanted to). If all you need is "run daily at HH:MM", the system schduler can certainly do it. It's more a delivery/deployment/field engineering question than a deep architechtural one. But you need to consider:

        - what happens when the system is restarted? - what happens when a job is killed? - who changes scheduling (the process/administrator/you)? how often? w +here from? - how is the system installed? - how portable does this need to be?