Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses

What to test in a new module

by Bod (Vicar)
on Jan 28, 2023 at 22:16 UTC ( #11149996=perlquestion: print w/replies, xml ) Need Help??

Bod has asked for the wisdom of the Perl Monks concerning the following question:

I've created a helper function for my own purposes and thought it would be useful to others. So CPAN seems a sensible place to put it so others can use it if they want to...

It's function is simple - to go to the homepage of a website and return an array of URI's within that site, being careful not to stray outside it, that use the http or https scheme. It ignores things that aren't plain text or that it cannot parse such as PDFs or CSS files but includes Javascript files as links (thing like or document.location.href) might be lurking there. It deliberately doesn't try to follow the action attribute of a form as that is probably meaningless without the form data.

As the Monastery has taught be that all published modules should have tests, I want to do it probably and provide those tests...

But, given that there is only one function and it makes HTTP requests, what should I test?

The obvious (to me) test is that it returns the right number of URIs from a website. But that number will likely change over time, so I cannot hardcode the 'right' answer into the tests. So beyond the necessary dependencies and their versions, I'd like some ideas of what should be in the tests, please.

In case you're interested, this came about from wanting to automate producing and installing sitemap files.

Replies are listed 'Best First'.
Re: What to test in a new module
by SankoR (Prior) on Jan 28, 2023 at 22:44 UTC
    Include sample pages with the dist that your code should handle correctly. Include URIs that aren't supposed to be gathered by your code, tricky URIs, etc. When you fix bugs later, add tests that make sure you have no regressions.

    Refactor your code so you can call and test the 'logic' without grabbing a remote page.
Re: What to test in a new module
by kcott (Archbishop) on Jan 29, 2023 at 00:50 UTC

    G'day Bod,

    Obviously, without seeing the module, I can only give generalised information. The following describes how I do testing. It's fairly standard but different authors have their own way of doing things. I also have my own naming conventions: not dissimilar from what many others use (but certainly not universal). I'd suggest looking around CPAN and seeing what others have done.

    Firstly, my (skeleton) module directory layout tends to follow this pattern:

    Some-Module/ Changes Makefile.PL MANIFEST MANIFEST.SKIP lib/ Some/ README t/ *.t test files here

    Test files are typically split into two groups: those generally run for any installation; and, Author Only tests which are normally skipped for a general make test.

    General Tests

    The first is always "00-load.t". It's short, simple, and just tests that "use Some::Module;" works. It uses Test::More::use_ok() and typically looks something like this:

    #!perl -T use strict; use warnings; use Test::More tests => 1; BEGIN { use_ok('Some::Module') } diag "Testing Some::Module $Some::Module::VERSION";

    If an object-oriented module, the next test is "01-instantiate.t". The complexity of this script will depend on whether there are any instantiation arguments and, if so, whether they are required or optional. Here's a simple example, paraphrased from a real test script:

    #!perl -T use strict; use warnings; use Test::More tests => 3; use Some::Module; my $sm; my $eval_ok = 0; eval { $sm = Some::Module::->new(); 1; } && do { $eval_ok = 1; }; is($eval_ok, 1, 'Test eval OK'); is(defined $sm, 1, 'Test Some::Module object defined'); isa_ok($sm, 'Some::Module');

    Individual methods and functions are tested next. Wherever possible, I put tests for each method or function in their own separate scripts. These follow the same naming conventions; for example, "02-some_method.t", "03-some_function.t", and so on. Here you need to test all possible argument combinations and return values. Consider as many problematic use cases as possible and test that they are all handled correctly; add more tests as other problems are encountered (either from your own work or bug reports from others).

    I tend to put all tests in their own anonymous block:

    #!perl -T use strict; use warnings; use Test::More tests => N; use Some::Module; { # Isolate tests with one set of arguments my $sm = Some::Module::->new(...); my @args = (...); is($sm->meth(@args), ... } { # Isolate tests with a different set of arguments my $sm = Some::Module::->new(...); my @args = (...); is($sm->meth(@args), ... }

    There's a plethora of modules in the "Mock:: namespace". Although I haven't used it myself, Test::Mock::LWP looks like it might be useful for you. I didn't spend any time searching for you; this one just happened to stand out; do have a look around for others.

    Author Only Tests

    These are really only for you. They generally represent sanity checks of the module code, and ancillary files, in your distribution. They are typically triggered by an environment variable having a TRUE value; and are skipped otherwise.

    In the spoiler below, I show three scripts that I've pulled verbatim from a random, personal distribution; these are standard for me (with, potentially, some variation in version numbers). I actually use these with all of my $work modules as well (although, they do have a few more as standard).

    — Ken

Re: What to test in a new module
by eyepopslikeamosquito (Bishop) on Jan 29, 2023 at 01:48 UTC

    I've created a helper function for my own purposes and thought it would be useful to others ... As the Monastery has taught me that all published modules should have tests, I want to do it probably and provide those tests ... what should I test?

    Bod, you are asking this question too late! The Monastery has also taught you to write the tests first because the act of writing your tests changes and improves your module's design:

    Writing a test first forces you to focus on interface - from the point of view of the user. Hard to test code is often hard to use. Simpler interfaces are easier to test. Functions that are encapsulated and easy to test are easy to reuse. Components that are easy to mock are usually more flexible/extensible. Testing components in isolation ensures they can be understood in isolation and promotes low coupling/high cohesion. Implementing only what is required to pass your tests helps prevent over-engineering.

    -- from "Test Driven Development" section at Effective Automated Testing

      The Monastery has also taught you to write the tests first because the act of writing your tests changes and improves your module's design

      You are quite correct - as usual...

      However, I have extraordinary cognitive problems with doing this. Trying to work out what a module is going to do and how it will do it before writing a line of code is quite a leap of conceptualism for me. I do not doubt that I could learn this cognitive skill if coding and module design were my job but they are very much a sideline. At 55 my brain's plasticity is fading a little I notice which doesn't help.

      Over in this node it was suggested that I might like to create a module for Well Known Binary (WKB) from the work I had already done to read one file. I started writing the tests for that module but it has ground to a halt because of the issue above.

      Back to this "module"...
      It didn't start out as a module. It started as a bit of throw away code to build an array. It then turned into a sub in a small script for my own very limited use. Then, and only then, did I think it might be helpful to other people as it is a relatively general building block.

      I don't think tests are necessary for bits of throw away code. Nor for simple scripts that are only intended to be used by me.
      Do you think otherwise?

        Tests for "throw away" code, no. Tests for scripts only for me, a qualified no - if it is important the script is "correct" or subject to revision over time (hmm, isn't that anything that's not "throw away) then test can be very useful to avoid regressions. Test for public facing code, solid yes.

        For code that evolves from throw away, to personal use, to "lets make this a module" it seems sensible that tests should evolve from none, to maybe some, to something that looks like TDD. Aside from anything else. casting the code in a TDD framework forces you to think about the scope of the code and how other people might use it. Thinking about usage and scope shapes the API. TDD then helps codify the API and test its utility and suitability.

        Agile programming advocates often suggest that the code is the documentation, but with TDD the tests are the documentation. In a sense TDD is about writing the documentation, or at least the problem description before you write the code, and that seems like an altogether good thing to do. Thinking about what code should to before you write it can't be a bad thing surely?

        Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond

        I don't think tests are necessary for bits of throw away code. Nor for simple scripts that are only intended to be used by me. Do you think otherwise?

        No. Bod, I think you're doing a great job. I trust you appreciate from my numerous Coventry working-class asides, I just enjoy teasing you. :)

        Of course, working as a professional programmer for large companies is a completely different ball game. If you ship buggy code that upsets a big important customer, you might even be subjected to a probing Five Whys post mortem. For still more pressure, as indicated at On Interfaces and APIs, try shipping a brand new public API to thousands of customers, with only one chance to get it right.

        I might add that when I'm doing recreational programming (as I've been doing quite a bit lately) I tend to just hack out the code without using TDD. In the tortuously long Long List is Long series, for example, I haven't written a single test, just test the output of each new version manually via the Unix diff command. Update: finally wrote my first LLiL unit test on Mar 01 2023.

        Of course, I could never get away with that at work, where you are not permitted to check in new code without passing peer code review (where you will be grilled on how you tested your code) and where you will typically check in accompanying unit and system test changes in step with each code change.

        For my personal opinion on how to do software development in large companies see: Why Create Coding Standards and Perform Code Reviews?

        Hello Bod,

        > However, I have extraordinary cognitive problems with doing this..

        you can try to follow my step-by-step-tutorial-on-perl-module-creation-with-tests-and-git to see if you get some inspiration.


        There are no rules, there are no thumbs..
        Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
      > write the tests first

      How are you supposed to write the tests before you write the code? What is being tested if there is no code? I searched for examples of this technique but could only find buzz word salad.

        The term TDD is unfortunate because API design is fundamentally an iterative process, testability being just one (crucial) aspect. For public APIs you simply don't have the luxury of changing the interface after release, so you need to get it right, you need to prove the module's testability by writing and running real tests before release.

        More detail on this difficult topic can be found in the "API Design Checklist" section at On Interfaces and APIs. One bullet point from that list clarifies the iterative nature of TDD:

        • "Play test" your API from different perspectives: newbie user, expert user, maintenance programmer, support analyst, tester. In the early stages, imagine the perfect interface without worrying about implementation constraints. Design iteratively.

        My Google search for "test driven design" got me Test-driven_development as a first hit. That is a short article that hits the high points and directly answers your objection - the tests fail until the code they test is written and is correct (at least in the eyes of the tests).

        TDD is a technique I use occasionally, but in each case I've used it the result has been spectacular success. When I have used TDD I've also used code coverage to ensure a sensibly high proportion of the code is tested. In my experience the result was seeming slow progress, but substantially bug free and easy to maintain (i.e. high quality) code as a result.

        Not all projects can use TDD. My day job is writing hardware specific embedded code for in house developed systems. Testing software embedded in hardware is challenging!

        Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond

        This seemed like a perfectly reasonable question. I gave it an upvote which resulted in: "Reputation: 0". So, someone had downvoted your post. Why? Because you had the temerity to question dogma? I wasn't impressed with this but there's little I can do about it.

        As with many things, there's a spectrum with many shades of grey between black and white at the extremities. It is rare for either "black" or "white" to be optimal; a compromise somewhere in the "grey" is usually the best option. This applies equally to software development: writing all of the code first, then bolting on tests afterwards, is a bad move; similarly, writing all tests first, which will obviously fail until the code is written afterwards, is also a bad move; what is needed is a compromise.

        What follows is how I achieve this compromise. I'm not suggesting this is in any way perfect; it is, however, something to consider in terms of the principles involved. Probably the main point is that the "black" and "white" extremes are avoided.

        I start most modules with module-starter and use Module::Starter::PBP as a plugin. I like the templating facilities provided by Module::Starter::PBP but not the templates themselves (so I've edited those quite substantially). I have many versions of the configuration which vary depending on: personal code, $work code, Perl version, type of module, and so on — the following refers to personal code for v5.36.

        This gives me a directory structure along the lines described above. The module code looks like this:

        package Some::Module; use v5.36; our $VERSION = '0.001'; 1; __END__ =encoding utf8 ... POD templates and boilerplate ...

        The t/ directory will contain equivalents of the three 99-*.t Author Only scripts shown above, and a template for 00-load.t which looks like:

        #!perl use v5.36; use Test::More tests => 1; BEGIN { use_ok('__MODULE__') } diag "Testing __MODULE__ $__MODULE__::VERSION";

        Applying a global s/__MODULE__/Some::Module/ to that file gives me a working distribution. I can now run the well-known incantation:

        perl Makefile.PL make make test

        I have created application and test code in unison: the compromise.

        From here, the specifics will vary with every module; however, the main principle is to add small amounts of functionality and concomitant tests incrementally. Continue doing this until all functionality is coded and has tests.

        In closing, I'll just note that the OP's title had "What to test"; I've added "[When to test]" to my title indicating this subthread is straying from the original. We actually don't know if Bod had already written all of his tests except the one he asked about, or if he was adding tests as an afterthought. Assuming the latter, and rebuking him for it, was a mistake in my opinion.

        — Ken

        I'm going to get eaten alive for this but TDD is a something I think people adhere to in a dogmatic fashion without a lot of thought put into API ergonomics and organic development.

        I am 100% in agreement that your code needs to be tested to the point before you reach diminishing returns. I do not feel like cementing yourself in place by writing your tests first is the way to accomplish this.

        You write your tests first and now a) you are now going to try to fit your implementation into that mold and b) you now have 2 things to refactor until you reach stable parity with your design and implementation.

        Unless you're designing and writing code against a predefined spec/RFC, I really don't feel like strict adherence to TDD is beneficial. Code needs to develop organically and allowed to form its own flow instead of being hammered into a predefined hole of a certain shape.

        Three thousand years of beautiful tradition, from Moses to Sandy Koufax, you're god damn right I'm living in the fucking past

Re: What to test in a new module
by stevieb (Canon) on Mar 12, 2023 at 21:04 UTC
    "what should I test?"

    The parsing mechanics. There's no need to test the underlying net access stuff, it tests itself. Also, please don't enable by default internet bound tests in the unit test suite. These should be developer-only tests, with an env var for a user to enable them if they wish.

    Set up a data directory within your test suite with a bunch of various HTML files with various URLs, and test the parsing functions.

    If you need to test error codes and return values, you can mock out a request/response.

Re: What to test in a new module
by bliako (Monsignor) on Mar 12, 2023 at 16:42 UTC

    In order to avoid cementing an API before implementation but also stick with the "everything-starts-with-a-test" approach (which I appreaciate its benefits), I like to break down my code into smaller functions, each with the simplest possible API.

    For example, fetch_url($urlstr), html2dom($htmlstr), extract_urls_from_dom($dom), is_url_pointing_to_pdf($urlstr). And I leave the user-calling function las. Until I reach the time to implement it, I am already testing these simple functions and the final user-calling function's API is crystallising in my head.

    So, I start with a test! But for the bricks so-to-speak of the app. And in doing so, I slowly slowly settle on where to place the loo.

    p.s. SaNkoR's Refactor your code so you can call and test the 'logic' without grabbing a remote page. is good: use locally-fetched html to test your code rather than hitting the sites with the risk of your tests failing each time they change. On this, I put network-access tests in author tests, or live tests which are only executed by me and not by the potential user.

    bw, bliako

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11149996]
Approved by johngg
Front-paged by kcott
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2023-06-08 12:51 GMT
Find Nodes?
    Voting Booth?
    How often do you go to conferences?

    Results (31 votes). Check out past polls.