Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am having a first go at writing an Object-Orientated Module and was hoping for some general advice as to the proper use of methods.

The module is a simple text-parsing one which aims to help index some technical documentation. The idea is that if I fed in a phrase like "12 bananas" it could do something like:

$testdata = new FruitID("12 bananas"); print $testdata->quantity; #"12" print $testdata->fruit; #"banana" print $testdata->colour; #"yellow"
The code is therefore quite simple--it mostly consists of if ($stringtotest =~ ...) clauses--but quite long since it’s value is in the technical vocabulary.

My question is how I should arrange the methods in the module?

Should I arrange the code so all of the if clauses (i.e. the heavy lifting of the module) are in the "new" method? Or should I use "new" method simply for declaring and blessing the constructors and add an intermediate method (e.g. $testdata = new FruitID("12 bananas"); $testdata->process; print $testdata->quantity; #"12")? Or should I try to break up the code so that the processing is only done when one uses a method which expects to return a value (e.g. methods like $testdata->quantity;).

I presume that if all the values are likely to be requested at the same time and if the tests are very interlinked (e.g. I need to know fruit="banana" before I can return colour="yellow") then it is best to add the heavy code to the "new" method or a process method. However, I am not sure which of these I should choose and the factors, if any, I should consider.

Thanks!

  • Comment on Writing Object-Orientated Module: use of “new” and other methods.
  • Download Code

Replies are listed 'Best First'.
Re: Writing Object-Orientated Module: use of “new” and other methods.
by chromatic (Archbishop) on May 06, 2011 at 01:13 UTC

    The simpler your API, the easier it is to use. I tend to prefer your third option—but I also like to know that once someone has constructed an object of my class, that object is in a well-understood and safe state. (I'll forget to call $obj->process and things will go wrong.)

    Moose has an interesting concept called a lazy builder method which gets called to create an attribute if that attribute isn't already set. What you can do is pass in your raw data to the constructor, then parse only if someone requests the value of an attribute which relies on that raw data.

    Performing this parsing in the constructor is fine as well, of course, but you don't have to write a constructor in most Moose code.

Re: Writing Object-Orientated Module: use of “new” and other methods.
by CountZero (Bishop) on May 06, 2011 at 06:23 UTC
    As others have already said, "it depends".

    If your parsing needs to be done in any case before the object has any use, why not do it in the constructor.

    If you need to answer the various questions immediately after construction, lazy evaluation is of low usefulness: better have all the heavy lifting done up-front then.

    If the return value of some methods depend on other info being available beforehand, then that is a good reason to control yourself the processing and not expect the user to make the right choices.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: Writing Object-Orientated Module: use of “new” and other methods.
by JavaFan (Canon) on May 06, 2011 at 00:36 UTC
    My question is how I should arrange the methods in the module?
    There are many ways. I prefer to have my "constructors" (although what commonly is called a constructor isn't really a constructor - Perl has a constructor, and it's called bless) just create and return an object, and nothing else. Initializing the object is just asking for trouble in the long run; specially if you want to use MI.

    So, I'd do something like:

    package FruitID; sub new {bless {}, shift} sub init { my $self = shift; ... initialize object ... } ... package main; my $testdata = FruitID->new->init("12 bananas");
    Note also that I call new as a class method, and I'm not using indirect calls. That will bite you sooner or later as well, as the call is ambiguous.
      Dative invocations are only ambiguous if you do them sloppily; they are not inherently so. This is a myth, an urban legend in Perl culture. It is not true.
      sub new { die "main::new sub called" } sub Foo { die "main::Foo sub called" } # can't use Foo->new here!!!!!!!!!!! $obj = new Foo:: ; print "Got $obj\n"; package Foo; sub new { return bless [] => shift(); }

      That will correctly print out Got Foo=ARRAY(0x7ec91370). There is no ambiguity: you just have to do it right, that’s all. That said, it’s no fun to chain them.

      And by the way, the mechanism you used won’t work in the code above. Your style can be just as buggy as what you’re worried about people doing by using dative syntax. If you don’t want to use the package-quoted dative syntax of new Foo::, you still must use Foo::->new and not Foo->new, because otherwise you’ll still bomb out. Please try it with the commented-out line to see what I mean.

      Avoiding the dative does not avoid the problem. Package-quoting does.

      Here’s another real-world example:
      my $sorter = new Unicode::Collate:: upper_before_lower => 1, preprocess => \&reduce_for_sorting, entry => deQ<<'END_OF_OVERRIDE', |Q| |Q| 005B 006E 002E ; [.0200.0020.0002.0391] # [n. |Q| 005B ; [.0220.0020.0002.0392] # [ |Q| 005D ; [.0225.0020.0002.0395] # ] |Q| END_OF_OVERRIDE ;
      And here’s another:
      state $formatter = new Unicode::LineBreak:: Context => "NONEASTASIAN", ColumnsMax => 80, ColumnsMin => 8, Format => "SIMPLE", SizingMethod => \&tabbed_sizing, TailorLB => [ ord("\t") => LB_SP, LEFT_QUOTES() => LB_OP, RIGHT_QUOTES() => LB_CL, ], ;
      See how nice and legible that is when written that way? And there really and truly is no ambiguity — unlike with your code, where there is.


      PS: I see my stalker is quick to the draw. Yawn!