|No such thing as a small change|
[RFC] Discipulus's step by step tutorial on module creation with tests and gitby Discipulus (Abbot)
|on Dec 19, 2018 at 11:14 UTC||Need Help??|
Good morning nuns and monks,
Nothing to read during next holidays? ;=)
I wrote this tutorial and I'll really appreciate your comments and corrections.
The following tutorial is in part the fruit of a learn by teaching process, so before pointing newbies to my work I need some confirmations.
The tutorial is a step by step journey into perl module development with tests, documentation and git integration. It seemed to me the very minimal approach in late 2018.
Being a long post (over the 64kb perlmonks constraint) the second part is in a reply to this node and because of this, the table of content links are broken for the second part (I'll fix them sooner or later).
I'll gladly accept comments here or as pull request (see contributing), as you wish, about:
The module code presented is.. fairly semplicistic and I do not plan to change it: the tutorial is about all but coding: tests, documentation, distribution and revision control are the points of this guide and I tried to keep everything as small as possible. If you really cannot resist to rewrite the code of the module, rewrite it all and I can add a TIMTOWTDI section, just for amusement.
By other hand the day eight: other module techniques has room for improvements and additions: if you want to share your own tecquiques about testing, makefile hacking, automating distribution I think this is the place. I choosed module-starter to sketch out the module as it seemed to me simple and complete, but it has some quirks. Other tools examples can be worth another day of tutorial, but keep it simple.
When you have commented this tutorial I'll remove the [RFC] in the title and I'll point newcomers to this guide (or better if it will be reposted in other section?), if you judged it is worth to read.
UPDATE 20 Dec. Added a readmore tag around the below content. The online repository is receiving some pull requests ;) so I added a version number to the doc. Tux is very busy but pointed me to Release::Checklist and I'l add it to the tutorial.
Discipulus's step by step tutorial on module creation with tests and git
day zero: introduction
This tutorial is not about coding: that's it! The code, idea and implementation presented below are, by choice, futile, piffling and trifling ( the module resulting following this tutorial is available, archived on its own repository).
This tutorial, by other hands, tries to show to the beginner one possible path in module creation. As always in perl there are many ways to get the job done and mine is far to be the optimal one, but as I have encountered many difficulties to choice my own path, perhaps sharing my way can help someone else.
There are other similar but different source of knowledge about module creation, notably Josť's Guide for creating Perl modules: read this for some point i do not explore (well, read it anyway: it's worth to)
the bag of tools
As for every duty, check your equipment before starting. You probably already have perl, a shell (or something less fortunate if you are on windows, like me ;) and a favourite text editor or IDE.
But here in this tutorial we'll use git in the command line and github to store our work in a central point (very handy feature). So get a github account and a git client.
This tutorial will focus on the importance (I'd say preminence or even predominance) of testing while developing a perl module. I wrote lonely scripts for years then I realized that even if my script seemed robust, I have no way to test them in a simple and reliable way.
We also use the core module Carp to report errors from user point of view.
We use Module::Starter to have the skeleton of our module done for us, but, as always there are valid alternatives. Install it.
We'll document our module using POD (Plain Old Documentation) see perlpod for reference.
Some of your programs or modules work on lists and arrays. Functions inside these programs accept ranges but while you intend what a valid ranges is ( 0,1,2 or 0..2 ) you discover that your programs crashed many times because other humans or other programs passed ranges like: 0,1..3,2 (where 2 is present twice) or 3,2,1 (and your application is silently expecting 1,2,3 ) or 9..1 or even 0,1,good,13..15 not being a range at all, or simply 1-3 being a range for the user but not for you perl code that read it as -2.
Bored of the situation you plan a new module to validate ranges. Range::Validator is the name you choose. Your initial plan is to expose just one sub: validate
As in masonry, you need a well prepared plan before starting excavations. Then you need points and lines drawn on the terrain: everything that makes the job complex is part of the job itself.
Look around: you can bet someone else got your same idea before you. You can also bet he or she was smarter than you and it already uploaded it to CPAN.
Sharing early is a good principle: if you already have an idea of your module (even before implementing it), can be worth to ask in a forum dedicated to Perl (like perlmonks.org) posting a RFC post (Request For Comments) or using the dedicated website prepan.org(is not a crowdy place nowadays..;).
Plan it well: it is difficult, but remember that to repair something bad planned is always a worst task. The basic read is in the core documentation: perlnewmod is the place to start and perlmodstyle is what comes next. Dont miss the basic documentation.
If you want to read more see, in my bibliotheca, the scaffold dedicated to modules.
Choose carefully all your names: the module one and names of methods or functions your module exports: good code with bad named methods is many times unusable by others than the author.
Programming is a matter of interfaces. sic. dot. Coding is easy engineering is hard. sic. another dot. You can change a million of times the implementation, you can never change how other people use your code. So plan well what you offer with your module. You can add in the future new features; you cannot remove not even one of them because someone is already using it in production. Play nice: plan well.
You can profit the read of a wonderful post: On Interfaces and APIs
day one: prepare the ground
step 1) an online repository on github
Create an empty repository on the github server named Range-Validator (they do not accept :: in names) see here for instruction
step 2) a new module with module-starter
Open a shell to your scripts location and run the program module-starter that comes within Module::Starter It wants a mail address, the author name and, obviously the module name:
A lot of work done for us! The module-starter program created all the above files into a new folder named Range-Validator let's see the content:
We now have a good starting point to work on. Spend some minute to review the content of the files to get an idea.
step 3) a local repository with git
Open another shell for the git client (I prefer to have two, feel free to use just one) to the same path of the above created folder and initialize a git repository (local for the moment) there:
Nothing impressive.. What happened? The above command created a .git directory, ~15Kb of infos, to take track of all changes you'll make to your files inside the Range-Validator folder. In other words it created a git repository. Empty. Empty?!? And all my files?
It's time for a command you'll use many, many times: git status
Many terms in the above output would be worth to be explained, but not by me. Just be sure to understand what branch, commit, tracked/untracked means in the git world. Luckily the command is so sweet to add a hint for us as last line: (use "git add" to track)
Git is built for this reason: it can track all modifications we do to code base and it take a picture (a snapshot in git terminology) of the whole code base everytime we commit these changes. But git init initialized an empty repository: we must tell git which files to add to tracked ones.
We simply want to track all files module-starter created for us: git add . add the current directory and all its content to tracked content. Run it and check the status again:
We added all content but we still have not committed anything! git commit -m "some text" will commit all changes using the message provided as a label for the commit (without -m git will open a text editor to enter the text). Run it and recheck the status again:
With the above we committed everything. The status is now working tree clean what better news for a lumberjack used to examine daily tons of dirty logs? ;)
Now we link the local copy and the remote one on github: all examples you find, and even what github propose to you, tell git remote add origin https://github.com/... where origin is not a keyword but just a label, a name: I found this misleading and I use my github name in this place or something that tell me the meaning, like MyFriendRepo. So from now on we will use YourGithubLogin there.
Add the remote and verify it ( with -v ):
The verify operation gives us two hints: for the remote repository that we call YourGithubLogin we can do fetch (import all changes you still have not, from the remote repository to your local copy) or push (export your local copy to the remote repository).
Since on github there is nothing and locally we have the whole code base, we definitively want to push and we can do that if and only if, we have the permission in the remote repository. It's our own repository, so no problem (git will ask for the github password). The push wants to know which branch to push: we only have master so:
Go to the github website to see what happened: the whole code base is in the online repository too, updated to our last commit (aka our first, unique commit for the moment). From now on we can work on our code from any machine having a git client. To do so we must be diligent and committing and pushing our changes when is the moment, to maintain the online repository up to date. Clean yard, happy master mason.
A whole day is passed, well.. two days, and we did not wrote a single line of perl code: we are starting the right way! Time to go to sleep with a well prepared playground.
day two: some change and tests
step 1) POD documentation
Well first of all some cleaning: open you local copy of the module /path/to/Range-Validator/lib/Range/Validator.pm in your text editor or IDE. Personally I like the POD documentation to be all together after the __DATA__ token rather interleaved with the code. Inside the code I only like to have comments. POD documentation is for the user, comments are for you! After a week or month you'll never remember what your code is doing: comment it explaining what is passing.
So go to the end of the module where the line is the final 1; ( remember all modules must have a true return value as last statement) and place, in a new line the __DATA__ token. Move all POD after the token. Also cancel the POD and the code relative to function2
Then rename function1 into validate and change accordingly the name of the POD section too.
Modify the POD part =head1 NAME with a more humble and meaning description: Range::Validator - a simple module to verify array and list ranges
Change the =head1 SYNOPSIS part too, removing unneeded text and changing code lines ( see below ): we do not do an object oriented module, so no new method for us. You plan to accept both real ranges and strings representing ranges.
So, if you followed me, the module must look like:
Ok? Let's check our new POD is correct: open the shell in the directory created yesterday /path/to/Range-Validator and run the following command: perldoc ./lib/Range/Validator.pm
Review the POD. It must be ok.
step 2) first test
Now we test if the module syntax is correct. The first simple method is a short one liner using the perl option -I to include ./lib in @INC and -MRange::Validator to use our module( see perlrun and perlvar ):
No errors: good! the module can be used and has no syntax errors. But.. one moment: we want to try out all our features, and we plan to add many, using one liners? Are we mad?! No; we will use tests.
Tests are wonderful in perl and planning good tests (a test suite) will save a lot of time in the future and makes your code maintainable. The time you invest writing tests while coding will save a lot of time in the future when you modify the code base. I'm not a theoric of software writing nor an orthodox of test driven development, but to write tests while you code is a very good practice. You can even write tests before coding ie: you write something that test a wanted behaviour, you run it expecting a failure, then you write the code that make the test happy. This is up to you.
What is not a choice is having no test suite or writing all tests at the end of code development. No.
In the day one we used module-starter to produce a skeleton of our module. module-starter was so kind to write a bounch of tests for us in the standard directory /t (ie tests). Tests are run normally during the installation (sorted by their names) of the module but, as we already said, they are the main source of serenity for us as developers. So let's see what module-starter wrote inside /t/00-load.t
This perl program use strict and wanrings (you already know they are friends, do you?) then load the core module Test::More which generally requires that you declare how many tests you intend to run ( plan tests => 1 ) then inside the BEGIN block use its method use_ok that loads our own module and in case of failure print "Bail out!\n" aka "everything went wrong, leave the boat".
If the above succeeded Test::More calls diag that emits a note with the text specified, useful to have while reviewing test output. The module also has the note method that I prefer. Go to the module documentation to have an idea of Test::More
So, instead of the one liner we can safely call this test:
The test crash because of the -T that turns taint mode on. Taint mode is base of the security in perl, but for the moment we do not need it enabled, so we remove from the shebang line which will result in #!perl (read about taint mode in the official perl documentation: perlsec).
(Note that removing -T switch is not the best thing to do: perl -T -I ./lib ./t/00-load.t is by far a better solution).
After this change the test will run as expected:
Wow! we run our first test! ..yes, but in the wrong way. Well not exactly the wrong way but not the way tests are run during installation. Test are run through a TAP harness (TAP stands for Test Anything Protocol and is present in perl since ever: perl born the right way ;).
With your perl distribution you have the prove command (see its documentation) that run tests through a TAP harness. So we can use it.
We can call prove the very same way we called perl: prove -I ./lib ./t/00-load.t but we are lazy and we spot prove -l which has the same effect of prove -I ./lib ie include ./lib in @INC
Run the very same test through prove instead that perl and you will see a slightly different output:
Basically the output includes some statistics and the count of test files processed and the overall number of tests. Also note that the message emitted by diag is in another place: diagnostics by Test::More goes to STDERR (which is buffered differently in respect of STDOUT but this is another story..) and TAP aggregates tests results and prints them to STDOUT
Finally we have the developer gratification: Result: PASS indicating all went well.
The prove program promotes laziness and without argument (as a test file in the previous example) runs automatically every test file found under /t folder: this is the same behaviour you will have during an effective module installation:
step 3) commit changes with git
Ok we have done some change to the code base, small ones but changes. Wich changes? I'm lazy and I do not remember all files we modified. No problem git will tell us. At least I remember which command I need to review the code base status: git status
Go to the git shell and run it:
Ah yes, we modified two files: not only the module also the t/00-load.t removing the -T from shebang line, thanks git and you are also so kind to give me two hints about what to do next: use "git add" and/or "git commit -a"
Go for the shorter path: we commit adding all files with git commit -a ie: we commit all files that are already tracked and eventually we remove from tracked list all files deleted in the code base. But we remember that committing needs to include a message as label of the commit: git commit -m "message" so putting all together and checking the status:
step 4) pushing to github repository
Ok we submitted, well committed, all changes made. What's next? We have to synchronize the online repository that we named YourGithubLogin so check it and push modified content to it:
Go to the browser and open the online repository to see what happened after the git push: in the main page, where files are listed we spot our two modified files with a new timestamp and with the message we used when committing. Under the Insight tab and then under Network in the right menu, we can see two points connected by a line segment: this is the visual history of our repository and each commit we have done: here you will find also eventual branches, but this is another story.
Well, another day is passed without writing a single line of perl code! At least for the moment our code is 100% bug free ;) I vaguely recall a chinese motto: "when you start something, start from the opposite" or something like that. To write a robust perl module start writing no perl code, for two days!
day three: finally some code
step 1) first lines of code
It's time to put some code inside our validate subroutine. We plan to accept both a string like '1..3,5' and a pure range like 1..5,6 but let's start with string form assuming only one element will be passed to our sub via @_
Remember what said in the foreword: this tutorial is not about coding, so be merciful with following examples.
The above is straightforward (if ugly): we get something in via @_ (a string or a list) and we return something via return @range To accomplish this we initialize $range to hold our string.
A good principle in loops is "put exit conditions early" and following this principle we put our our die conditions as soon as possible, ie after the if/else check.
But we dont want to die with an ugly message like Died at ../Range/Validator.pm line x ie from the module perspective: we want to inform the user where his code provoked our module to die.
The core module Carp provides this kind of behaviour and we use its function croak that dies from perspective of the caller.
So we add the line needed to load the module, a first croak call if the string passed in contains forbidden characters and some other line too:
step 2) testing on our own
How to see if all works as expected? Obviously with a test. Not 00-load.t but a new one dedicated to the validate sub. So go into the t folder and create a new file 01-validate.t and open it to edit the content.
Let's populate it with a basic content plus some new stuff (01-validate.t):
First of all we used a different notation for Test::More ie. use Test::More qw(no_plan)
We are telling to the module we (still) have not a plan about how many tests will be in this file. This is a handy feature.
The Test::More core module offers us ok use_ok and note methods: view the module doc for more info about them.
But we also used in the above test the dies_ok function: this one comes from the CPAN module Test::Exception and we need to add this module in to our dependencies list.
Dependencies list? What is that? Where we spoke about this? Never, until now.
step 3) add dependencies in Makefile.PL
Infact the program module-starter used in day one created a file called Makefile.PL with the following default content:
This file is run on the target system trying to install your module. It's vaste matter and you can find many, many useful informations in the core documentation of ExtUtils::MakeMaker and in the ExtUtils::MakeMaker::Tutorial and, as always in perl, there many ways to do it.
In our simple case we only need to know few facts about BUILD_REQUIRES and PREREQ_PM fields.
The first one lists into a hash all modules and their version needed to build up our module, building includes testing, so if you need some module during tests it is the place where insert dependencies. The module-starter program added 'Test::More' => '0' entry for us. This is the right place to state that we intend to use Test::Exception CPAN module during tests.
By other hand PREREQ_PM lists modules and their minimal versions needed to run your module. As you can see it's a different thing: to run Range::Validator you never need Test::Exception but, for example you 'll need Carp
Even if Carp it's a core module is a good practice to include it into PREREQ_PM
Read a very good post about dependencies Re: How to specify tests dependencies with Makefile.PL?
Cleaning example lines and given all the above, we will modify Makefile.PL as follow:
So the moral is: when you add a dependency needed to run your module or to test it remember to update Makefile.PL correspondent part.
step 4) run the new testOk, is the above test ok? It returns all we expect? Try it using prove -l but specifying also -v to be verbose and the filename of our new test (now we dont want all test run, just the one we are working on):
step 5) commit, add new files and push with git
What we need more from our first day of coding? To check our status and to synchronize our online repository (pay attention to the following commands because we have a new, untracked file!):
We committed before adding the new file! shame on us! Add the new file and issue another commit:
What more? Ah! pushing to the online repository:
What a day! We added six lines of code and an entire test file! Are we programming too much? Probably no but we are doing it in a robust way and we discovered it can be hard work. In perl hard work is justified only by (future) laziness and we are doing all these work because we are lazy and we do not want to waste our time when, in a month or a year, we need to take this code base again to enhance it or to debug it. So now it's time for the bed and for deserved colorful dreams.