I have some Perl programs that are called like this (a simplification, but it captures the gist):

cat data.txt | prog1.pl | prog2.pl option | prog3.pl > out.txt

I'd like to set up simple test scripts, one per program, to check whether certain input data yields the expected output:

# input file good1 good2 bad1 good3 # expected output good1 good2 good3

So, just to reemphasize, there would be a separate test for each program, prog1.pl, prog2.pl, and prog3.pl, as opposed to having just one test for the entire pipeline.

There are many ways to do this. I'm wondering what nice idioms people have come up with for this. Nice properties of a test harness would be that it keeps the input and expected

output data (which can be fake, usually) in a nicely plain text, easy to read and easy to edit format. That rules out this, for example:

my %testdata = ( 'good1' => 'good1', 'good2' => 'good2', 'bad1' => '', 'good3' => 'good3', );

I seems possible all the programs could be tested by the same test script, using different data for each test script. On the other hand, I don't know if that's the best solution, because it seems nice for each test to be self contained.

Another choice is to use a __DATA__ section. That's OK, but it comes at the end of the file, decreasing ease of reading and editing a tad. And it would require another label be embedded and parsed... not that that's hard, but the simpler, the better.

Still another choice is HERE docs. I'm leaning toward that, but wondering if there are better suggestions:

my ($args = <<'ARGS')=~s/^\s+//gm; arg1 ARGS my ($input = <<'INPUT')=~s/^\s+//gm; good1 good2 bad1 good3 INPUT my ($expected_output = <<'OUTPUT')=~s/^\s+//gm; good1 good2 good3 OUTPUT

Separate from the setup question is how to compare the actual and expected results. Or at least I thought at first that these were separate. But there may be some interaction between the two questions. When the data is stored in a hash, each element is being treated individually, with a placeholder for an empty result in some cases. With the HERE doc, when the result is empty, it just means a shorter list. The latter feels like a better approach, and for my application there doesn't need to be a one-to-one correspondence between each input and output item. My main concern is finding a nice way to set up the data. Thoughts or suggested idioms appreciated!


In reply to Setting up tests for cat-type filter programs by dmorgo

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.