[swengr] Componentizing Complex Apps

overview

When one has to solve a complex problem, one breaks it down into component parts. The most obvious way to do this is to create a bunch of perl modules/classes and simply call them as appropriate. However, this obvious approach could have some drawbacks. For one, it is not possible to remove the memory used by a module. Thus, should a module need to dynamically load code, it lacks the luxury of dynamically unloading it later. Thus a long-running application may grow in size even if certain modules had limited use.

This document attempts to catalog the various means by which you might consider creating functional components. This table lists the various means of componentizing an application:

Method	Description	Ads/Disads
Tiered	This is how enterprise systems are built. Well defined layers (independant programs) take well defined input and convert it to well-defined output. When one tier is done, it knows what tier to call next.	The biggest win with this type of componentization is the ability to use operating system interrupts to "pop the execution stack". For example if tier A calls tier B calls tier C and you want tier C to stop, you can simply kill the process and you will return to tier B. This would be very difficult to do with Perl objects because signal handlers are global to an application, not to a conceptual tier of execution.
Piped	A piped system leverages the wealth of tools available in the Unix shell. One of the most amazing shell-level complex applications is NOSQL, a truly relational database which makes use of a number of shell-utilities on tab-delimited files to relate and compute on tables without SQL. In fact, a related commercial utility, /RDB, makes a strong case for The Unix Shell as a 4th Generation Language	The great advantage is no part of the stream stays in memory longer than it is used. Also it is very easy to study steps of the process by simply chopping off a few members of the pipeline. The downfall is that dataflow is unidirectional. Also all data must be passed from process to process instead of being memory-resident. Another downfall is that data tends be accessed by position instead of name, thus changes in data format must be accomodated manually in each script dependant on the old format. However, should a naming convention exist, as it does for JDB, then each element of the pipeline must acquant itself with the format before beginning its work.
Client-Server	In this case, you can again reduce the memory footprint of your application by creating a large number of "remote procedures". Perl support for this is available in POE and STEM, SOAP::Lite, RPC::XML::Client, as well as the Net::* hierarchy on CPAN.	The great win with this sort of system is that each daemon will have well-defined input and output, thus reducing the number of errors resulting from package variables hanging around to snatch the rug out from under you. You also don't have to have the whole program on one machine and tax the memory.
Perl Objects	This approach should be obvious

project description

I am in the process of finishing up work on a rather large set of tiered programs. Basically I am in charge of taking content for a website written in Excel spreadsheets and loading it into a content management system. The content describes government agencies, and each agency will have from 1 to 5 files describing it:

organization - the divisions within the agency
service - the services of each division
faq - questions that are asked about the agency (each question is categorized by division and service)
contact - the people who work in the division.
literature - any .pdf or .doc files or other literature associated with the agency

So what I wish to now describe is our workflow.

sanity checking - the excel files are converted to HTML and then the table of the HTML file is checked cell by cell for "sanity", meaning it has no offending characters and in some cases is not blank, etc. The decision to convert to HTML instead of using Spreadsheet::ParseExcel was made before I got here.
bulkload.pl - The inbox will have a ton of files, from 1 to 5 for each agency. The task of bulkload.pl is to take the inbox and group the files by agency. So this Perl program is basically a glorified glob
process-agency.pl - This level takes an "agency set", meaning a set of (1-5) files for an agency and does some different housekeeping tasks such as preparing a log directory. Then it calls the lowest tier to process a single file of an agency. Once all files for an agency have been sent to the lowest level and processed, then the logs for each independent run of the lowest level are collated and the "agency set" is moved to a processed directory based on the current datetime. Also, if the word ERROR showed up in any of the collated logs, then the log file has ERROR in its name.
html-to-dcrs-engine.pl - This is the workhorse, taking a single HTML file and converting each row to data records for the content management system and also possibly populating an XML database which stores a list of all divisions and services for an agency.

Carter's compass: I know I'm on the right track when by deleting something, I'm adding functionality

Edit by tye to add READMORE tag

Comment on [swengr] Componentizing Complex Apps Download Code

Replies are listed 'Best First'.
Re: [swengr] Componentizing Complex Apps by djantzen (Priest) on Nov 27, 2002 at 02:53 UTC
A couple of points I don't understand: Why is using object-oriented Perl an entirely separate manner of organizing a large project? Any one of the three preceding categories could be implemented using OO Perl, or non-OO Perl, or C, or lisp, or ... . Are tiered and client/server really that distinct from one another? It is common to speak of n-tier client/server applications (I'm working on one now). It seems you mean to say that a tiered application must reside on a single machine, whereas only client/server apps are allowed to communicate over the network. To me, a tiered system is one that horizontally abstracts functionality into, for example, a database, a database access layer, a business layer, and a presentation layer. A client/server system utilizes interprocess communication possibly over a network to enable one program to invoke commands in another. But these are certainly compatible ways of organizing an application, and in fact together they provide a solution for your signal-handling worry.	[reply]
Re: Re: [swengr] Componentizing Complex Apps by princepawn (Parson) on Dec 02, 2002 at 17:06 UTC
Why is using object-oriented Perl an entirely separate manner of organizing a large project? Any one of the three preceding categories could be implemented using OO Perl, or non-OO Perl, or C, or lisp, or ... . Hmmm, good point. It is common to speak of n-tier client/server applications (I'm working on one now). Hmm, so what I have is a type of tiered application. A inward spiraling tier: program A calls B calls C which does the inner work then we pop back to B then back to A. Thanks for your comments. I had hoped to generate a lot of discussion. Carter's compass: I know I'm on the right track when by deleting something, I'm adding functionality	[reply]