Re: Perl pipeline builder

I assume that by "pipeline" you mean a sequence of multiple transformations that you want to apply to your data. Ideally, you want to be able to run pars of these transformations in parallel.

I see various degrees of hairyness that you can apply here.

The easiest approach, if your data model fits it, is to use make and a Makefile. This buys you easy/trivial restartability and trivial parallelization. On the downside, all your data must reside in files and the rules to get from one set of files to another (set of) file(s) are fairly restricted. Especially, I think, because make can only have one output file for a set of input files and not a set of output files. One aspect of make is that it requires you to think backwards from the result you want to the intermediate results until you get to a point where you can start from.

The next best approach is any of the various job queues into which you stuff the steps of your pipelines. I know of Minion, Queue::Dir and Directory::Queue and various modules in the Job namespace. These all model each job (step) as a separate program or module. The advantage is that you get fairly simple restartability and parallelization/distribution, even across machines. The downside is that you will have to adapt your existing programs/modules to whatever mechanism you choose and think about whether you will push all steps for a given job into the job queue at once or whether each job should know about the next step that should be taken after it has finished. The approach of a job queue is more forward-thinking, as you know what you start out with and likely already know the next step to be taken in each case.

Depending on how far you can/want/need to go when implementing your jobs, maybe Workflow is something that you can use. This could allow you to organise the sequence of steps in a central location. Also, Workflow (or anything like it) allows you to have loops and retries, things that are hard to model in make.

Personally, I have only used/written hard-coded sequences and nothing that was nice to generate configuration from. The most sophisticated and never used idea I toyed with was something like make which could also look at SQL tables and run SQL statements to see whether rules are satisfied, but it quickly grew too far in complexity.

Comment on Re: Perl pipeline builder Select or Download Code