Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

The programmer at wit's end for lack of space can often do best by disentangling himself from his code, rearing back, and contemplating his data. Representation is the essence of programming.

-- from The Mythical Man Month by Fred Brooks

Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.

-- Rob Pike

As part of our build and test automation, I recently wrote a short Perl script for our team to automatically build and test specified projects before checkin.

Lamentably, another team had already written a truly horrible Windows .BAT script to do just this. Since I find it intolerable to maintain code in a language lacking subroutines, local variables, and data structures, I naturally started by re-writing it in Perl.

Focusing on data rather than code, it seemed natural to start by defining a table of properties describing what I wanted the script to do. Here is a cut-down version of the data structure I came up with:

# Action functions (return zero on success). sub find_in_file { my $fname = shift; my $str = shift; my $nfound = 0; open( my $fh, '<', $fname ) or die "error: open '$fname': $!"; while ( my $line = <$fh> ) { if ( $line =~ /$str/ ) { print $line; ++$nfound; } } close $fh; return $nfound; } # ... # -------------------------------------------------------------------- +---- # Globals (mostly set by command line arguments) my $bldtype = 'rel'; # -------------------------------------------------------------------- +---- # The action table @action_tab below defines the commands/functions # to be run by this program and the order of running them. my @action_tab = ( { id => 'svninfo', desc => 'svn working copy information', cmdline => 'svn info', workdir => '', logfile => 'minbld_svninfo.log', tee => 1, prompt => 0, run => 1, }, { id => 'svnup', desc => 'Run full svn update', cmdline => 'svn update', workdir => '', logfile => 'minbld_svnupdate.log', tee => 1, prompt => 0, run => 1, }, # ... { id => "bld", desc => "Build unit tests ${bldtype}", cmdline => qq{bldnt ${bldtype}dll UnitTests.sln}, workdir => '', logfile => "minbld_${bldtype}bldunit.log", tee => 0, prompt => 0, run => 1, }, { id => "findbld", desc => 'Call find_strs_in_file', fn => \&find_in_file, fnargs => [ "minbld_${bldtype}bldunit.log", '[1-9][0-9]* errors +' ], workdir => '', logfile => '', tee => 1, prompt => 0, run => 1, } # ... );

Generally, I enjoy using property tables like this in Perl. I find them easy to understand, maintain and extend. Plus, a la Pike above, focusing on the data first usually makes the coding a snap.

Basically, the program runs a specified series of "actions" (either commands or functions) in the order specified by the action table. In the normal case, all actions in the table are run. Command line arguments can further be added to specify which parts of the table you want to run. For convenience, I added a -D (dry run) option to simply print the action table, with indexes listed, and a -i option to allow a specific range of action table indices to be run. A number of further command line options were added over time as we needed them.

Initially, I started with just commands (returning zero on success, non-zero on failure). Later "action functions" were added (again returning zero on success and non-zero on failure).

As the table grew over time, it became tedious and error-prone to copy and paste table entries. For example, if there are four different directories to be built, rather than having four entries in the action table that are identical except for the directory name, I wrote a function that took a list of directories and returned an action table. None of this was planned, the script just evolved naturally over time.

Now is time to take stock, hence this meditation.

Coincidentally, around the same time as I wrote my little script, we inherited an elaborate testing framework that specified tests via XML files. To give you a feel for these, here is a short excerpt:

<Test> <Node>Muss</Node> <Query>Execute some-command</Query> <Valid>True</Valid> <MinimumRows>1</MinimumRows> <TestColumn> <ColumnName>CommandResponse</ColumnName> <MatchesRegex row="0">THRESHOLD STARTED.*Taffy</MatchesRegex> </TestColumn> <TestColumn> <ColumnName>CommandExitCode</ColumnName> <Compare function="Equal" row="0">0</Compare> </TestColumn> </Test>

Now, while I personally detest using XML for these sorts of files, I felt the intent was good, namely to clearly separate the code from the data, thus allowing non-programmers to add new tests.

Seeing all that XML at first made me feel disgusted ... then uneasy because my action table was embedded in the script rather than more cleanly represented as data in a separate file.

To allow my script to be used by other teams, and by non-programmers, I need to make it easier to specify different action tables without touching the code. So I seek your advice on how to proceed:

  • Encode the action table as an XML file.
  • Encode the action table as a YAML file.
  • Encode the action table as a JSON (JavaScript Object Notation) file.
  • Encode the action table as a "Perl Object Notation" file (and read/parse via string eval).
  • Turn the script and action table/s into Perl module/s.

Another concern is that when you have thousands of actions, or thousands of tests, a lot of repetition creeps into the data files. Now dealing with repetition (DRY) in a programming language is trivial -- just use a function or a variable, say -- but what is the best way of dealing with unwanted repetition in XML, JSON and YAML data files? Suggestions welcome.

References

Update: I ended up taking BrowserUk's advice and leaving the script alone.

See also: A good way to input data into a script w/o an SQL database (2023) and Data Structure References


In reply to Data-driven Programming: fun with Perl, JSON, YAML, XML... by eyepopslikeamosquito

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (5)
As of 2024-03-28 20:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found