Part of a website includes an online edition of a weekly newspaper. I have a website that includes just the articles, indexed by year/issue and with a simple keyword search. The editors of the newspaper use it for their research. In the future it maybe adopted by the main site.

I have an app that keeps my site in sync and has worked well for a couple of years. There have been some changes on the main site so I’ve taken the opportunity to do a review.

It is _one big script_ containing the mother of all data structures, so breaking it down was on the order of the day.

I thought I’d give the objects containing objects idea a run for its money. An outline sketch showing the public methods:

The script looks like:
my $paper = Paper->new($cnf_file); for my $issue ($paper->issues){ while (my $article = $issue->article){ # an iterator $paper->process($article); } } $paper->post_process;
Each specific task has its own module with its own specific methods and attributes and its own API. The detailed work (e.g. scraping an issue index page to get the url for each article) is easily tested in isolation. With judicious use of special configuration files for debugging I’ve avoided messing with the ‘production’ db or whacking the web server during development (I must admit I wish I had thought of that earlier – one unfortunate infinite loop later…).

It took a while to get the ‘model’ straight (and may still be far from perfect) but I feel the development time thereafter was much reduced. You can tinker with the innards of an object ‘til your hearts content and as long as you abide by the ‘published API’ no harm is done. I’m sure that with a script that relies heavily on LWP any overhead introduced by using Perl objects will hardly be noticed.

I’m pleased that I’ve finally got the Word module out on its own. It takes a string and returns words. I can use that again.

I like the black box concept, I found I could concentrate and test one thing at a time. I started this about a month ago and I’ve been harbouring a meditation along these lines for some time. Long before the most recent ruin and destruction wrought by IconoclastUK. :-)

Sure, more typing, more things to go wrong but I look on it as “Perl between handrails”. :-)


In reply to Make everything an object? by wfsp

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.