I have a script that reads a structured file, looks for various elements, and does a few search/replace operations, writes out a translated version of the file, and also records various bits of useful information and writes them out to separate files.

The file in question is an export from IBM DataStage, which is a graphical data processing tool. The file contains the "jobs" that we have created, most of which consist of reading and writing CSV files and hashed files.

The design for the script is rather haphazard at the moment, built up over time from a very simple single search-replace operation into something of a nine-headed hydra of a script.

I'm after some tips about how to go about rewriting this script. It needs to be able to identify various elements in the file based on context. For instance, a chunk might look like this:

BEGIN DSJOB Identifier "AP_CDBS_Vendor_Summary"
This is an easy one - any line starting with three spaces and the word "Identifier" is the Job Name.

Here's a more complex example:

BEGIN DSRECORD Identifier "ROOT" DateModified "1899-12-30" TimeModified "00.00.01" OLEType "CJobDefn" Readonly "0" Name "AP_CDBS_Vendor_Summary" Description "Collates all of the data for the customer master mi +gration." NextID "194" Container "V0" FullDescription "The first part of the routine gathers data from + the ABAP which extracts the necessary data from the SAP tables KNA1 +and KNB1 (NB the keys of the link between KNA1 and KNB1 will form the + basis of all the ABAP queries)." JobVersion "50.0.0" ControlAfterSubr "0" Parameters "CParameters" BEGIN DSSUBRECORD Name "ROOT" Prompt "/home/migration/Dev root " Default "/home/migration/Dev" ParamType "0" ParamLength "0" ParamScale "0" END DSSUBRECORD BEGIN DSSUBRECORD Name "SITE" Prompt "Business Unit, ie WHUB" Default "CDBS" ParamType "0" ParamLength "0" ParamScale "0" END DSSUBRECORD BEGIN DSSUBRECORD Name "AOW" Prompt "Area of Work ie AP" Default "AP" ParamType "0" ParamLength "0" ParamScale "0" END DSSUBRECORD BEGIN DSSUBRECORD Name "DMR" Prompt "DMR/Spec ie Vendors" Default "Vendors" ParamType "0" ParamLength "0" ParamScale "0" END DSSUBRECORD MetaBag "CMetaProperty" BEGIN DSSUBRECORD Owner "APT" Name "AdvancedRuntimeOptions" Value "#DSProjectARTOptions#" END DSSUBRECORD NULLIndicatorPosition "0" IsTemplate "0" NLSLocale ",,,," JobType "0" Category "1.FSS\\2.AP\\6.CDBS\\1.Vendors\\3.Reports" CenturyBreakYear "30" ...another 20 lines of stuff I don't need... END DSRECORD

From here, I need to pick up the Category "1.FSS\\2.AP\\6.CDBS\\1.Vendors\\3.Reports", and all of the parameter Name and Default values, "ROOT" = "/home/migration/Dev", "SITE" = "CDBS", etc.

There are some more complex cases, where I need to pick up one or two paths from a DSRECORD section, and then one or more pairs of filenames from DSSUBRECORD sections.

Any hints on how I should structure this program, so that it is clear and maintainable to someone who is new to Perl? Any modules that would make this easier?

By way of an apology, it's been a long time since I did any serious procedural or OO programming, 10 years ago I'd have been able to do this with my eyes closed.


In reply to Design hints for a file processor by PhilHibbs

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.