in reply to Module Bloat and the Best Solution

I feel compelled to respond, considering Kurt's using one of my code snippets as an example.

I definitely agree that using modules for things like XML parsing, Database Access (DBI) and other commonly use, non-trivial functionality makes sense. These are usually complex tasks that would take some time to "reinvent" and the modules are well tested and maintained (usually). Yes, don't reinvent the wheel in those cases.

However, the example Kurt gives is that of finding the unique values in an array. This seems like a coding task that can be handled by the fundamentals of the language; and that is attested to if you look at the 2 lines in the 'uniq' subroutine in the actual List::MoreUtils module. Finding the unique elements in an array, from my experience with Perl, would be something I would never have thought about using a module to handle. Part of why I use Perl is that there are simple, easy ways to do things like finding the unique elements in an array.

And, no matter which way I code it, I can make it handle as many edge cases or local constraints that my program may have. I'm sure there are at least 1000 ways to skin a cat, and so there are ways to find unique elements in an array with Perl. If I need to do it more than once in the program I can just as easily put it into a subroutine of my own, with out the need to download List::MoreUtils at all.

But, for a sufficiently complex task, and one that is common to the particular domain I'm writing for (database access, xml parsing, web sites, etc.) I will almost definitely look to see if there is a module that exists that will make my life easier, streamline my code, shorten my development time, etc. But from my standpoint, finding the unique elements in an array is not one of those.

---
echo S 1 [ Y V U | perl -ane 'print reverse map { $_ = chr(ord($_)-1) } @F;'
Warning: Any code posted by tuxz0r is untested, unless otherwise stated, and is used at your own risk.

Replies are listed 'Best First'.
Re^2: Module Bloat and the Best Solution
by dragonchild (Archbishop) on Nov 12, 2007 at 15:25 UTC
    Why are you increasing your maintenance burden? Why are you choosing to disregard battle-tested code? The uniq() in List::MoreUtils was worked over by many people over several years and was written in a way so as to both do the right thing and do it quickly.

    Do you know why it was written the way it was instead of the naive sub uniq { my %x;@x{$_} = undef for @_; keys %x }? There are at least two major problems with that code and possibly as many as four or more. And, if you don't know why, you have no business writing your own version cause you're going to screw it up.

    Even though I know why it was written the way it was, I still use it because when another problem is found, I get the bugfix for free! I know how to write a hashtable, but I don't choose to because it's boring (to me) and I'll screw it up. Same thing with uniq() or any of the other 2 dozen functions that module provides.


    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

      Are you saying ALL CPAN modules are battle-tested?

      Also, I'd say that getting the

      "bugfix for free"
      is a dangerous conceit. I know that when my users find a bug in production code, I can't wait for J. Random Module writer to provide a fix. I need to fix it myself. I have to own the bug immediately.

      The danger of statements like

      "And, if you don't know why, you have no business writing your own version cause you're going to screw it up."
      is that you are implying that if you can't write flawless code, you might as well not start to learn. I definitely can't back you on that sentiment. Perl is a great language for experimenting and learning on. I wouldn't want to be as discouraging as you are towards users.

      --
      I used to drive a Heisenbergmobile, but every time I looked at the speedometer, I got lost.
        Of course not! That's a ridiculous restatment and you know it. However, battletested code, such as List::MoreUtils, is worth using, even if you could do it yourself. It's one less moving part for you to maintain.

        As for bugfixes - I'm referring to bugs that were found in your production code, not mine. Of course, I own all bugs in my code. But, once a fix is pushed upstream, I get it without even knowing the bug had existed.

        As for the sentiment, I'm not implying that at all. However, I am saying that if you are putting code into production and expecting users (or boss or client) to pay you for the service, then you have a professional responsibility to use the best version of code possible. That means that you should use the battle-tested, as flawless as possible, code from CPAN (should such code exist). Of course you learn from it - that's why OSS is so valuable: you can read the source. But, you don't have a Not-Invented-Here complex preventing you from delivering the best value for the money you get paid.


        My criteria for good software:
        1. Does it work?
        2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

      Do you know why it was written the way it was

      I don't. What I don't understand is why it uses numerical comparison and map instead of the plain and simple

      sub unique { my %h; grep !$h{$_}++, @_; }
      which, at least on Perl 5.8.8, is faster. (Notably faster if there are many identical elements.)

      Any enlightenment would be appreciated.

      lodin

        Chromatic already mentioned overloaded stringification. The naive solution posited only returns the stringification, period. Furthermore, the naive solution posted doesn't retain order while the solution provided by List::MoreUtils does.

        An additional feature of L::MU's uniq is that it provides a prototype while most variations don't. I thought there was a fourth item, but I could be mistaken in my old age.


        My criteria for good software:
        1. Does it work?
        2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

        What if some of the elements in the array are objects with overloaded stringification?