(Sorry for the uninformative title; I really couldn't think of anything better.)

Here's yet another design question of the kind I find inordinately difficult.

Suppose I need to fetch and process, on a regular basis, some data for about 100 different "entities", say the Fortune 100 companies. The goal is to generate an output file for each company, with a uniform format for all the companies.

Now, here's the thing. Assume as a given that the data I'm interested in is available only in formats that vary radically from one company to the next. E.g. in some cases I can scrape the data from the company's static website; in other cases it's easier to tweak a CGI page with WWW::Mechanize; in others cases I'd fetch flat files via FTP; or access an RDBMS directly, etc. On average I'll need anywhere between 25 and 500 lines of code to gather and process the data for each company.

My first thought was to create 100 modules:

Crunch::WalMart Crunch::ExxonMobil Crunch::GeneralMotors Crunch::FordMotorCompany Crunch::GeneralElectric . . . . . . Crunch::SupervaluInc Crunch::CiscoSystemsInc
all implementing the same simple API, say the method do_it(). Each module knows how to fetch the required data and what to do with it. I can then create a subdirectory Companies, containing subdirectories WalMart, ExxonMobil, GeneralMotors, ..., CiscoSystemsInc. The purpose of these directories is both to store the raw input files and the processed data files, and also as a way to list the companies of interest (namely, all those that are mentioned in the Companies subdirectory). With this set-up, I could then have a master update function, to be run periodically, that would look like this:
use File::Spec::Functions 'catdir'; sub do_em { my $path_to_companies = shift; opendir $dh, $path_to_companies or die "Can't opendir $path_to_companies: $!\n"; while ( my $company = readdir( $dh ) ) { next if $company ~= /^\./; my $dir = catdir( $path_to_companies, $company ); next unless -d $dir; my $module = 'Crunch::' . $company; eval "require $module; $module\::do_it( '$dir' ); 1" or die; } }
To me, this reeks to high heaven, though I can't quite say why. Perhaps it's an aversion to using eval, or because it's too reminiscent of the newbie-ish tendency to want to use symbolic refs.

An alternative approach would be to create an array of coderefs, one per company:

use File::Spec::Functions 'catdir'; { my @do_it = ( \&do_WalMart, \&do_ExxonMobil, . . . \&do_CiscoSystemsInc, ); sub do_em { my $path_to_companies = shift; $_->( $path_to_companies ) for @do_it; } } sub do_WalMart { my $name = 'WalMart'; my $dir = catdir( shift, $name ); # blah blah blah }
...but this entails having a huge file with many disparate functions, having little to do with one another.

I can think of a trillion other schemes, but not a single one presents itself as a clear winner somehow. What's your opinion?

the lowliest monk


In reply to Thorny design problem by tlm

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.