laknarc has asked for the wisdom of the Perl Monks concerning the following question:

we are using oracle database as job control repository. we have 1000's of ETL jobs to execute based on the dependency. So far i have built some control tables and a stored procedure to pull the jobs based on the dependency. When we need to start the batch, i need to update the "BATCH_STATUS" to "START" and trigger the batch jobs based on the dependency. im trying to build a perl script to start the batch and keep running the stored procedure to pull the jobs based on the dependencies and execute the jobs. when job execution starts i need to log with a start time and when completes update the end time and elapsed time. this script will run until batch completes. After that we need to stop the batch by updating "STOP" in control table so that the infinite process gets killed.

START_STOP_CTL (BATCH_STATUS,NO_OF_JOBS_CAN_RUN_PARALLEL) JOBS (all the jobs with status - READY, RUNNING,SUCCESS,FAILED) JOBPARAMS - All the job parameters JOBDEPENDENCIES - All jobs dependent information JOB_LOG -- Log each job execution

I need your help to run this script in loop and execute the jobs.my common script name is "run_etl_jobs.pl". the same script will be called to execute each and every ETL job with different parameters. Please provide your ideas to build this script.

Replies are listed 'Best First'.
Re: Job scheduling?
by KurtSchwind (Chaplain) on Jul 09, 2015 at 19:40 UTC

    Did you have a current copy of run_etl_jobs.pl for us to look at? Or are you asking that someone write it up from scratch for you?

    --
    “For the Present is the point at which time touches eternity.” - CS Lewis

      i have this Perl script "run_etl_jobs.pl". when i start a batch by updating the batch status field in a table to "START", the script should run until i stop the batch. Also, within the loop, i need to get the job names with parameters returned by stored procedure and execute the jobs parallely . once these jobs are triggered, need to update the job status to "RUNNING" and log the start time to JOB_LOG table. once the jobs are completed, update the job status to "SUCCESS" and log the end time. Also, when 5 jobs are running and i should not run any other jobs(max we can run 5 jobs at a time). i have handled this in store proc. However, My question is that i need a startup script to build this master script with this scenarios. or else if CPAN have any built in modules/packages to schedule these kind of jobs with database repository. please point that. basically i'm building a scheduler so that i can use for my project.

        I think none of what you are trying to do is difficult... I do it quite a lot with my own etl projects... I like the idea of building configuration files for etl or any scraping work to help easily control everything... you can use anything to create configuration files... I like excel a lot, but mostly use perl because it's a habit for me now

Re: Job scheduling?
by GotToBTru (Prior) on Jul 10, 2015 at 14:10 UTC

    Is this pseudo-code correct?

    update batch_status to START update log file create pool of 5 threads while batch not complete { if open thread available { get next job from stored procedure and execute in that thread } handle any completed thread(s) } update batch_status to STOP update log file
    Dum Spiro Spero