(Sorry for the uninformative title; I really couldn't think of anything better.)
Here's yet another design question of the kind I find inordinately difficult.
Suppose I need to fetch and process, on a regular basis, some data for about 100 different "entities", say the Fortune 100 companies. The goal is to generate an output file for each company, with a uniform format for all the companies.
Now, here's the thing. Assume as a given that the data I'm interested in is available only in formats that vary radically from one company to the next. E.g. in some cases I can scrape the data from the company's static website; in other cases it's easier to tweak a CGI page with WWW::Mechanize; in others cases I'd fetch flat files via FTP; or access an RDBMS directly, etc. On average I'll need anywhere between 25 and 500 lines of code to gather and process the data for each company.
My first thought was to create 100 modules:
all implementing the same simple API, say the method do_it(). Each module knows how to fetch the required data and what to do with it. I can then create a subdirectory Companies, containing subdirectories WalMart, ExxonMobil, GeneralMotors, ..., CiscoSystemsInc. The purpose of these directories is both to store the raw input files and the processed data files, and also as a way to list the companies of interest (namely, all those that are mentioned in the Companies subdirectory). With this set-up, I could then have a master update function, to be run periodically, that would look like this:Crunch::WalMart Crunch::ExxonMobil Crunch::GeneralMotors Crunch::FordMotorCompany Crunch::GeneralElectric . . . . . . Crunch::SupervaluInc Crunch::CiscoSystemsInc
To me, this reeks to high heaven, though I can't quite say why. Perhaps it's an aversion to using eval, or because it's too reminiscent of the newbie-ish tendency to want to use symbolic refs.use File::Spec::Functions 'catdir'; sub do_em { my $path_to_companies = shift; opendir $dh, $path_to_companies or die "Can't opendir $path_to_companies: $!\n"; while ( my $company = readdir( $dh ) ) { next if $company ~= /^\./; my $dir = catdir( $path_to_companies, $company ); next unless -d $dir; my $module = 'Crunch::' . $company; eval "require $module; $module\::do_it( '$dir' ); 1" or die; } }
An alternative approach would be to create an array of coderefs, one per company:
...but this entails having a huge file with many disparate functions, having little to do with one another.use File::Spec::Functions 'catdir'; { my @do_it = ( \&do_WalMart, \&do_ExxonMobil, . . . \&do_CiscoSystemsInc, ); sub do_em { my $path_to_companies = shift; $_->( $path_to_companies ) for @do_it; } } sub do_WalMart { my $name = 'WalMart'; my $dir = catdir( shift, $name ); # blah blah blah }
I can think of a trillion other schemes, but not a single one presents itself as a clear winner somehow. What's your opinion?
the lowliest monk
In reply to Thorny design problem by tlm
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |