in reply to Re: Module Bloat and the Best Solution
in thread Module Bloat and the Best Solution

Why are you increasing your maintenance burden? Why are you choosing to disregard battle-tested code? The uniq() in List::MoreUtils was worked over by many people over several years and was written in a way so as to both do the right thing and do it quickly.

Do you know why it was written the way it was instead of the naive sub uniq { my %x;@x{$_} = undef for @_; keys %x }? There are at least two major problems with that code and possibly as many as four or more. And, if you don't know why, you have no business writing your own version cause you're going to screw it up.

Even though I know why it was written the way it was, I still use it because when another problem is found, I get the bugfix for free! I know how to write a hashtable, but I don't choose to because it's boring (to me) and I'll screw it up. Same thing with uniq() or any of the other 2 dozen functions that module provides.


My criteria for good software:
  1. Does it work?
  2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

Replies are listed 'Best First'.
Re^3: Module Bloat and the Best Solution
by KurtSchwind (Chaplain) on Nov 12, 2007 at 16:34 UTC

    Are you saying ALL CPAN modules are battle-tested?

    Also, I'd say that getting the

    "bugfix for free"
    is a dangerous conceit. I know that when my users find a bug in production code, I can't wait for J. Random Module writer to provide a fix. I need to fix it myself. I have to own the bug immediately.

    The danger of statements like

    "And, if you don't know why, you have no business writing your own version cause you're going to screw it up."
    is that you are implying that if you can't write flawless code, you might as well not start to learn. I definitely can't back you on that sentiment. Perl is a great language for experimenting and learning on. I wouldn't want to be as discouraging as you are towards users.

    --
    I used to drive a Heisenbergmobile, but every time I looked at the speedometer, I got lost.
      Of course not! That's a ridiculous restatment and you know it. However, battletested code, such as List::MoreUtils, is worth using, even if you could do it yourself. It's one less moving part for you to maintain.

      As for bugfixes - I'm referring to bugs that were found in your production code, not mine. Of course, I own all bugs in my code. But, once a fix is pushed upstream, I get it without even knowing the bug had existed.

      As for the sentiment, I'm not implying that at all. However, I am saying that if you are putting code into production and expecting users (or boss or client) to pay you for the service, then you have a professional responsibility to use the best version of code possible. That means that you should use the battle-tested, as flawless as possible, code from CPAN (should such code exist). Of course you learn from it - that's why OSS is so valuable: you can read the source. But, you don't have a Not-Invented-Here complex preventing you from delivering the best value for the money you get paid.


      My criteria for good software:
      1. Does it work?
      2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

        I don't think it's such a ridiculous re-statement at all actually. Unless there is a way for a normal perl developer to know in advance what code is battle tested and what code has just been put up, you aren't giving the perl community much to base their decisions on. Sure, you have your reasons for using List::MoreUtils (even though it isn't available on the production boxes in my environment) because you feel it's rock-solid code you are willing to stake your reputation on.

        So I propose the question again: I'm a developer. I need to do some task in perl. When should I be hitting CPAN as opposed to just writing the few lines myself? What criteria should I use to know when to use CPAN and when not to? Furthermore, just be virtue of being in CPAN doesn't mean I necessarily want to stake my reputation on it. So what criteria do you use to know when you can stake your reputation on a CPAN module?

        --
        I used to drive a Heisenbergmobile, but every time I looked at the speedometer, I got lost.
Re^3: Module Bloat and the Best Solution
by lodin (Hermit) on Nov 12, 2007 at 18:01 UTC

    Do you know why it was written the way it was

    I don't. What I don't understand is why it uses numerical comparison and map instead of the plain and simple

    sub unique { my %h; grep !$h{$_}++, @_; }
    which, at least on Perl 5.8.8, is faster. (Notably faster if there are many identical elements.)

    Any enlightenment would be appreciated.

    lodin

Re^3: Module Bloat and the Best Solution
by BrowserUk (Patriarch) on Nov 12, 2007 at 17:12 UTC
      Chromatic already mentioned overloaded stringification. The naive solution posited only returns the stringification, period. Furthermore, the naive solution posted doesn't retain order while the solution provided by List::MoreUtils does.

      An additional feature of L::MU's uniq is that it provides a prototype while most variations don't. I thought there was a fourth item, but I could be mistaken in my old age.


      My criteria for good software:
      1. Does it work?
      2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

        How is a prototype of just @ a feature?

        lodin

      What if some of the elements in the array are objects with overloaded stringification?

        You mean something like

        print Dumper \@list; $var = ( [ 1,2,3 ], [ 4,5,6 ], [ 1,2,3 ], [ 4,5,6 ], [ 1,2,3 ], );

        From which List::MoreUtils::uniq will render

        use List::MoreUtils qw[ uniq ];; print for uniq @list;; ARRAY(0x194ac18) ARRAY(0x194ac54) ARRAY(0x194ac90) ARRAY(0x194accc) ARRAY(0x194ad08)

        But the required result might be:

        ARRAY(0x194ac18) ARRAY(0x194ac54)

        With the "naive" solution, any attempt to use the stringified reference (or other stringifed value) in the place of that value, is going to throw up an error immediately. Thereby warning you that either the nature of your data has changed and your solution is no longer good enough, or there is an up-stream error that is giving you bad data. With the module solution, you aren't going to get that error until some point later when it will be much harder to trace back to source.

        For simple scalar values, the "naive" solution works fine. And once you start adding non-simple values-- like references--into the mix, it is naive to believe that judging different references to structures containing identical values as distinct, will always be correct. Or even that it would be correct in a preponderance of cases.

        For every scenario where the modules result would be more correct, there is another where it would be more wrong.

        The point is: you have to know your data, and tailor your solution to that data and the required semantics.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.