drewbert has asked for the wisdom of the Perl Monks concerning the following question:

A while back I wrote a script to take care of an events schedule on a web site. It reads data from an XML file (such as title, date, description, URL, etc.) puts the data in a hash and rearranges it into HTML for output. Of course, it also watches the current date, and makes sure past events don't show up, and other tricks like that. It has been a real time-saver.

Lately I've been finding other potential uses for the script on the site. The problem is, I've got code in the script which is very specific for that particular XML file. For instance, the routines that turn the data into HTML - they expect very specific input:

sub basic_list { my (%entry) = @_; if ($entry{'subtitle'} ne "") { $entry{'subtitle'} = "<h4>$entry{'subtitle'}</h4>" } print <<END; <li><h3>$entry{'title'}</h3>$entry{'subtitle'}$entry{'long_date'}<br/> +$entry{'description'}</li> END }
Right now, if I want to handle a different XML file with different elements, I have to make a copy of my script and retrofit it.

But I don't want to go making a dozen scripts which are 99% the same and 1% different. (I also don't want to muck about in my lovingly crafted script every time I want to change the way my HTML output looks.) I know there's got to be a better way to do this, to separate out the details and make one program that works with all the XML files I throw at it. It's probably something really simple that I just haven't learned yet.

What's my best strategy here? I toyed with the idea of putting specific subroutines like the one above in the XML file itself, letting my script read and execute them, but I'm not sure if that's really the way to go. Is that ever done?

Replies are listed 'Best First'.
Re: one script, not twelve nearly identical ones!
by davido (Cardinal) on Aug 25, 2004 at 16:42 UTC

    That sounds like a classic example of what modules can be used for. However, there are two ways to approach factoring your code into modules. The first way is to put all of the usage-specific stuff into a module, and then decide at compiletime (inside a BEGIN{...} block) which module to load, possibly based on command line parameters. The second way is to put all of the common code (the 99% that never changes) into a module, and the usage-specific stuff into the main script. You could have a dozen different main scripts, each of them very short, each which invoke the same module of common tools.

    I happen to like the latter method. But don't stop there; there's one more solution: Use object oriented modules. Put all the common code into a base class, and then create a dozen classes that inherit from (or just use) that base class each to perform their own usage-specific task. Again, at compiletime, based on a command-line arg, you can decide which subclass to load in a BEGIN{...} block. That subclass already will know how to load the base class from which it inherits (or uses). And the actual package main code will simply be a minimal framework to initialize your subclass's object and set things in motion.

    Then you'll have a tool (or suite of them) that is easy to maintain and update. If something changes on a particular XML feed, all you have to do is change the subclass that deals with that particular XML feed. The base class stays the same, and package main stays the same. New subclasses can be developed without breaking the base class or package main.


    Dave

Re: one script, not twelve nearly identical ones!
by bronto (Priest) on Aug 25, 2004 at 17:12 UTC

    If I well understand what your needs are, and you like perl (I believe so :-), you could consider using XML::XPathScript. XPathScript is a stylesheet language based on Perl and (guess that!) XPath. You can write different declarative templates a-la-XSLT for each XML "dialect" you use, or do more complex elaborations with perl subroutines, and let the xpathscript program to the job.

    For (stupid) example, if you have an XML file like this:

    <?xml version="1.0"?> <a> <b>Something</b> </a>

    and this stylesheet:

    <% $t->{a}{pre} = '<html><head><title>my page</title></head>' ; $t->{a}{post} = '</html>' ; $t->{b}{pre} = '<body><h1>' ; $t->{b}{post} = '</h1></body>' ; %> <%= apply_templates() %>

    you must simply run xpathscript temp.xml temp.xps to get:

    <html><head><title>my page</title></head> <body><h1>Something</h1></body> </html>

    Anyway, you can do much, much more with XPathScript. I recommend to take a peek at the documentation on CPAN and give it a try!

    PS: Of course, XPathScript stylesheets are modular, so that you could import common HTML parts (like headers and footers) from subsheets into all the ones that are specific for a certain job

    Ciao!
    --bronto


    The very nature of Perl to be like natural language--inconsistant and full of dwim and special cases--makes it impossible to know it all without simply memorizing the documentation (which is not complete or totally correct anyway).
    --John M. Dlugosz
Re: one script, not twelve nearly identical ones!
by perlfan (Parson) on Aug 25, 2004 at 16:28 UTC
    You should consider using one of the many XML modules available in CPAN. From their, you can create some sort of data structure that contains your tags and the specific details that make it different from the generic implementation.
Re: one script, not twelve nearly identical ones!
by Aristotle (Chancellor) on Aug 25, 2004 at 21:23 UTC

    It sounds like what you're looking for is a templating system; a template is a way to describe how your output looks outside the code of the program. By changing the template, you can adapt the output for any other situation. Personally, I'm a big fan for Template Toolkit, which is generally considered the most powerful and complete one as well, but there are many others with merit as well. See perrin's templating system comparison.

    That covers the output side. If the format of your input XML file also differs from application to application, things get trickier. If the differences are trivial enough, it will probably suffice to stick some description into a configuration file; for a script processing an XML file, such a configuration might list XPath expressions, f.ex.

    If your needs vary wildly from case to case, you should instead pull common functionality out of the program, collect it in a module, and use that module in your various scripts.

    Makeshifts last the longest.

Re: one script, not twelve nearly identical ones!
by idsfa (Vicar) on Aug 25, 2004 at 16:36 UTC

    The proper tool for transforming XML into HTML is XSLT. Alternately, you may want to look at HTML::Template. In general, it is better for maintainability if you try to separate the content (data) from the presentation -- as you are now finding with your script.


    If anyone needs me I'll be in the Angry Dome.
      The proper tool for transforming XML into HTML is XSLT.

      There is no such thing as "the proper tool" without context. In many (possibly most) situations, XSLT may be the best way to go. However, in several easy cases I can think of, converting the XML straight to HTML would be disastrous. You're assuming that the XML contains all the information needed for the webpage and simply needs transformed. If that was the case, you are probably correct.

      However, if the data needs to be merged with another datasource, such as another XML file, then you would be horribly offbase. XSLT may be able to do that, but it certainly would not be the proper tool for that endeavor.

      I would have answered that XSLT is one option, depending on your situation. And, then, answered the question, which was "How do I improve the maintainability of twelve scripts that are nearly identical, save for the XML specification used?"

      ------
      We are the carpenters and bricklayers of the Information Age.

      Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

      I shouldn't have to say this, but any code, unless otherwise stated, is untested