http://qs1969.pair.com?node_id=1229263

nysus has asked for the wisdom of the Perl Monks concerning the following question:

As a less clueless Perl programmer, I had the attitude that I should just use a bunch of CPAN modules to do basic tasks like file slurping. Why write five lines of code when you can write two? As I started contributing to CPAN, I learned that there can be unwanted overhead in relying upon CPAN modules.

My basic attitude now is if it's some code that only I will use, I'll take the kitchen sink approach and stuff in whatever CPAN modules I can to make jobs easier. But if it's a CPAN module I'm contributing, I'll try to be more circumspect and put in some extra effort and try to stick to core modules. For example, it seems like it's smarter to stick with File::Spec instead of Path::Tiny.

I'm wondering if other more seasoned Perl programmers take the same approach. If not, what criteria do you use to determine when to use/not use a CPAN module?

$PM = "Perl Monk's";
$MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar";
$nysus = $PM . ' ' . $MCF;
Click here if you love Perl Monks

Replies are listed 'Best First'.
Re: Criteria for when to use a cpan module (Buy vs Build)
by eyepopslikeamosquito (Archbishop) on Feb 01, 2019 at 22:46 UTC

    Using a CPAN module (Buy) vs writing your own (Build) is a specific example of the broader Buy versus Build decision. Some rules of thumb:

    • Build if it's your core business; Buy if not (e.g. buy, don't build, your computer keyboard ... unless you're in the computer keyboard business :). Buying CPAN DBI and XML modules, for example, looks good because it allows you to leverage the work of experts in fields that you are probably not expert in. Moreover, widely used CPAN modules tend to be robust and have fewer bugs than code you write yourself because they are tested by more users and in many different environments.
    • Opportunity cost. Using a CPAN module usually takes less time than writing your own, giving you more opportunities to get your core business done.
    • Cost vs Risk. Using CPAN modules seems "free" but there are hidden costs and risks. What if a CPAN module has a security vulnerability? What if the author abandons it? What if the author changes the supported perl versions/platforms? What if the author releases a really buggy version? How hard/expensive is it to write your own? Writing your own XML parser, for example, is much harder than your own File Slurper. How quickly can you isolate/troubleshoot a bug in 3rd party code? Can you fix it in an emergency? (e.g. in a large production system, you may not have time to wait for author to fix it).
    • Dependencies vs Control. Writing your own saves you having to manage dependencies (e.g. Dependency hell) while giving you total control to tailor to your needs.
    • Quality and Trust. How much trust do you place in the third party CPAN module? Is it good quality? (e.g. CPAN ratings, Kwalitee score, bug counts, how quickly are bugs fixed?). Does it contain gratuitous/unnecessary dependencies? (the ::Tiny CPAN modules were a reaction against modules that seemed to haul in half of CPAN as dependencies). How widely used is it? Widely used modules tend to be more robust and have fewer bugs than ones you write yourself because they are tested by more users in many different environments.
    • Popularity. When you invest heavily in a 3rd party component, you want it to be popular and widely supported; you want to be able to ask for advice on using the module; you don't want it to die. If your CPAN module depends on a very popular CPAN module, there's a good chance that your module's users will already have this dependency installed.

    For a CPAN module author, every module you add as a dependency is a module that can restrict your module -- if one of your module's dependencies is Linux-only, for example, then your module is now Linux-only; if another requires Perl 5.20+ so do you; if one of your dependencies has a bug, you also have that bug; if a new release of one of your dependencies fails, the likelihood of your release being unable to install increases; take care with dependencies having a different license to yours. Don't introduce dependencies lightly.

    See also: w/Modules and w/o Modules

    Updated: added Opportunity cost bullet point, DBI/XML example, note that widely used modules tend to have fewer bugs, and warning re module dependencies. I've updated Writing Solid CPAN Modules with advice on this topic in a new "Dependencies" section.

      Another point to consider is the extra pain that you might have to deal with when you upgrade support to a new platform. I just finished upgrading our large and complex Perl-based internal production system to run under a newer version of Linux which comes with a newer Perl and other newer library versions. We use a large number of CPAN modules, and there were three CPAN suites that had to be 'fixed'. BerkeleyDB had to be recompiled to match the older libdb we use for our binaries (couldn't use the vanilla libdb that came with the new version of Linux). IO::All and IPC::Run started producing warnings due to being older versions. I ended up installing the latest version of IPC::Run and that worked fine -- luckily the interface hadn't changed in a way that broke our code that uses it. The issue with IO::All turned out to be a bug(ish) that didn't produce warnings under older versions of perl, but does in the version we upgraded too. Since that was actually a bug, even against the older version of perl (which we also still need to support), I elected to fix the bug in place, which means that I didn't have to upgrade IO::All and risk having to update our code to adapt to any changes in a newer IO::All.

      --DrWhy

      "If God had meant for us to think for ourselves he would have given us brains. Oh, wait..."

        I just finished upgrading our large and complex Perl-based internal production system to run under a newer version of Linux which comes with a newer Perl and other newer library versions.
        Yes, we face a similar problem across many different Unix flavours. We don't use the system Perl on any platform though, always build our own Perl from C sources. But yes it's a big and hairy problem which is why we're gonna do it early in the release cycle to allow plenty of time for flushing out obscure bugs. Unfortunately, we've got pretty poor test coverage on much of our code, so we'll need to do quite a bit of manual testing.

        BTW, I was flabbergasted to hear Titus Winters in C++ as a Live at Head Language claim that Google have a single C++ code repository, shared across the whole company, containing mega millions of lines of code and that they always "Live at Head", meaning that everyone is always using the latest version of all code ... so they never do "upgrades"! As you might expect, to pull this off, you need strong discipline and excellent test coverage, combined with very sophisticated automated tools.

        Some points from Titus Winters talk:

        • Programming ("Hey, I got my thing to work!") vs Engineering ("What happens when my code needs to live a long time?").
        • Engineering is Programming integrated over time.
        • SemVer proved inadequate at google (it over-simplifies and over-constrains). SemVer summary: given a version number MAJOR.MINOR.PATCH, increment the: MAJOR version when you make incompatible API changes; MINOR version when you add functionality in a backwards compatible manner; PATCH version when you make backwards compatible bug fixes (additional labels for pre-release and build metadata are available as extensions).
        • Mentions the dreaded diamond dependency and that dependency graphs grow quadratically.

        Titus Winters is the founder of Abseil and chair for Library Evolution Working Group (design for the C++ standard library).

        Update 2023: See also: Re: Rosetta Test: Long List is Long - Abseil

Re: Criteria for when to use a cpan module
by Tux (Canon) on Feb 01, 2019 at 22:03 UTC

    All credits to you for just thinking about it. Being aware that you put a burden on downstream users is a very nice start!

    Points to think about: when depending om somethink like Moose is directly and indirectly depending on a whole bunch of other things already, so adding Path::Tiny is not that much of a problem anymore. It is more than likely that one of Moose's requirements already depends on it anyway.

    The second thing to think about is how much back you want to support perl itself. 5.005? (probably not). 5.6.2? 5.8.4? 5.10.1? 5.12? Some modules you want to depend on (or like to use) might not support the perl *you* want to support.

    Another thing to be aware of is the number of modules that already depend on the module you want to depend on: the position in the river of that module. The higher up-river (like Test::More) the more likely any of the other modules already installed on the system already depend on it: the chance that it already is installed is higher.


    Enjoy, Have FUN! H.Merijn
Re: Criteria for when to use a cpan module
by Tux (Canon) on Feb 02, 2019 at 11:56 UTC

    this section in Release::Checklist might be of any help. Feedback welcome.

    If you want more (deeper) comments, also read this, which is feedback as an issue.


    Enjoy, Have FUN! H.Merijn
Re: Criteria for when to use a cpan module
by stevieb (Canon) on Feb 02, 2019 at 22:50 UTC

    Personally, I keep the mindset at all times that others may want to use the code I'm writing, so I *never* consider what I'm writing as something I'll use only for myself. That 'halves' the problem right there.

    The other feedback so far on this thread contains great advice, so I don't want to re-invent the wheel.

    If I'm to use one tiny function/method from a distribution that depend on a hundred other distributions, I'll re-write my own, or even copy/paste the part I need (so long as the license blends with mine).

    If it's a large dist I want (again requiring many others) where there's a few pieces I need/desire, I might completely rethink my design and work around that situation, if I can't find something smaller and more compact in another dist.

    Already been said, but criteria is author, bug fix time, responsiveness of author and/or collaborators, revision history, participation within the Perl community as a whole, and one of the often overlooked things, how large is this distribution I'm publishing to the CPAN... is it small enough that if I avoid a single external distribution I can stick to core? Or is it so large it's going to take 20 minutes to compile and install anyway?

    I've always tried my best to stick with core where possible, even if I have to do some extra work to do so. Heck, I still try to make my code work with perl-5.8.9 wherever possible.

    ps. Great question.

    pss. Oh yeah, nearly forgot: test coverage, and unit test suite quality are extremely high on my priority list when making such a decision.