overview
When one has to solve a complex problem, one breaks it down into
component parts. The most obvious way to do this is to create a bunch
of perl modules/classes and simply call them as appropriate. However,
this obvious approach could have some drawbacks. For one, it is not
possible to remove the memory used by a module. Thus, should a module
need to dynamically load code, it lacks the luxury of dynamically
unloading it later. Thus a long-running application may grow in size
even if certain modules had limited use.
This document attempts to catalog the various means by which you might
consider creating functional components.
This table lists the various means of componentizing
an application:
| Method
| Description
| Ads/Disads
|
| Tiered
| This is how enterprise systems are built. Well defined layers
(independant programs) take
well defined input and convert it to well-defined output. When one
tier is done, it knows what tier to call next.
| The biggest win with this type of componentization is the ability
to use operating system interrupts to "pop the execution stack". For
example if tier A calls tier B calls tier C and you want tier C to
stop, you can simply kill the process and you will return to tier
B. This would be very difficult to do with Perl objects because signal
handlers are global to an application, not to a conceptual tier of
execution.
|
| Piped
| A piped system leverages the wealth of tools available in the Unix
shell. One of the most amazing shell-level complex applications is
NOSQL, a truly
relational database which makes use of a number of shell-utilities on
tab-delimited files to relate and compute on tables without SQL. In
fact, a related commercial utility, /RDB, makes a strong case for
The Unix Shell as a 4th
Generation Language
| The great advantage is no part of the stream stays in memory longer than it is used. Also it is very easy to study steps of the process by simply chopping off a few members of the pipeline.
The downfall is that dataflow is unidirectional. Also all data must be passed from process to process instead of being memory-resident. Another downfall is that data tends be accessed by position instead of name, thus changes in data format must be accomodated manually in each script dependant on the old format. However, should a naming convention exist, as it does for JDB, then each element of the pipeline must acquant itself with the format before beginning its work.
|
| Client-Server
| In this case, you can again reduce the memory footprint of your
application by creating a large number of "remote procedures". Perl
support for this is available in POE and STEM, SOAP::Lite, RPC::XML::Client, as well as the Net::*
hierarchy on CPAN.
| The great win with this sort of system is that each
daemon will have well-defined input and output, thus reducing the
number of errors resulting from package variables hanging around to
snatch the rug out from under you. You also don't have to have the whole program on one machine and tax the memory.
|
| Perl Objects
| This approach should be obvious
|
|
project description
I am in the process of finishing up work on a rather large set of
tiered
programs. Basically I am in charge of taking content for a website
written in Excel spreadsheets and loading it into a content management
system. The content describes government agencies, and each agency
will have from 1 to 5 files describing it:
- organization - the divisions within the agency
- service - the services of each division
- faq - questions that are asked about the agency (each question is
categorized by division and service)
- contact - the people who work in the division.
- literature - any .pdf or .doc files or other literature associated
with the agency
So what I wish to now describe is our workflow.
- sanity checking - the excel files are converted to HTML and then
the table of the HTML file is checked cell by cell for "sanity",
meaning it has no offending characters and in some cases is not blank,
etc. The decision to convert to HTML instead of using Spreadsheet::ParseExcel was made before I got here.
- bulkload.pl - The inbox will have a ton of files, from 1 to 5 for each
agency. The task of bulkload.pl is to take the inbox and group the
files by agency. So this Perl program is basically a glorified
glob
- process-agency.pl - This level takes an "agency set", meaning a
set of (1-5) files for an agency and does some different housekeeping
tasks such as preparing a log directory. Then it calls the lowest tier
to process a single file of an agency. Once all files for an agency
have been sent to the lowest level and processed, then the logs for
each independent run of the lowest level are collated and the "agency
set" is moved to a processed directory based on the current
datetime. Also, if the word ERROR showed up in any of the collated
logs, then the log file has ERROR in its name.
- html-to-dcrs-engine.pl - This is the workhorse, taking a single
HTML file and converting
each row to data records for the content management system and also
possibly populating an XML database which stores a list of all
divisions and services for an agency.
Carter's compass: I know I'm on the right track when by deleting something, I'm adding functionality
Edit by tye to add READMORE tag
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.