mstone has asked for the wisdom of the Perl Monks concerning the following question:

Hey all -

I've been kicking around some OO design ideas, and have come up with a data storage class well suited to being the Model in a Model-View-Control architecture. It uses an embedded data hash and a table of functions to reduce the whole access interface to the functions set() and get().

Thing is, I'm not completely happy with the lookup tables.

The sample class looks like so:

package Data_holder; =head CONSTRUCTOR Nothing terribly exciting here. All we do is bless a hashref, embed a data hash, and store two hashes that map attribute names to the names of the methods that will handle them. =cut sub new { my $O = bless {}, shift; $O->{'data'} = {}; $O->{'get-funcs'} = { 'key-1' => 'get_normal', 'key-2' => 'get_synthetic', }; $O->{'set-funcs'} = { 'key-1' => 'set_normal', 'key-2' => 'set_synthetic', }; return ($O); } =head1 DISPATCH FUNCTIONS =over 4 =item get (list) : hash This is the inspection routine. It takes a list of attribute names as input, and returns a hash of named values as output. It does its job by calling functions identified in {'get-funcs'}. =cut sub get { my ($O, @keys) = @_; my $result = {}; for my $k (@keys) { my $func = $O->{'get-funcs'}->{ $k } || 'get_error'; $result->{ $k } = $O->$func ( $k ); } return ($result); } =item set (hash) : hash This is the storage function. It takes a hash of named values as input, and returns a hash of result codes as output. It does its job by calling functions identified in {'set-funcs'}. =cut sub set { my ($O, $args) = @_; my $result = {}; for my $k (keys %$args) { my $func = $O->{'set-funcs'}->{ $k } || 'set_error'; $result->{ $k } = $O->$func ( $k, $args->{ $k } ); } return $result; } =back =head1 INSPECTION METHODS These are the functions that do the actual work of getting values. They're called by get() when it finds their names in {'get-funcs'}. =over 4 =item get_normal (key) : value This routine simply pulls the named value from the emedded hash. =cut sub get_normal { my ($O, $key) = @_; return ($O->{'data'}->{ $key }); } =item get_synthetic (key) : value This routine is a little more interesting. It calculates the return value from some other value or combination of values in the embedded data hash. In this case, we'll calculate the farenheit value of a stored centigrade temperature. This is also just an example. In a real version of this class, you'd have a different function for every value you wanted to calculate. =cut sub get_synthetic { my ($O, $key) = @_; my $result = 32 + (1.8 * $O->{'data'}->{'temperature'}); return ($result); } =back =head1 STORAGE METHODS These are the functions that do the actual work of setting values. They're called by set() when it finds their names in {'set-funcs'}. =over 4 =item set_normal (key, value) : result This is the basic storage routine. It drops the value into the embedded data hash under the assigned key. It also returns a success code since there's no way that operation can fail. =cut sub set_normal { my ($O, $key, $val) = @_; $O->{'data'}->{ $key } = $val; return ('SUCCESS'); } =item set_synthetic (key, value) : result Like get_synthetic(), this is a sample routine. Here we store the centigrade value of a temperature entered in farenheit. =cut sub set_synthetic { my ($O, $key, $value) = @_; $O->{'data'}->{'temperature'} = ($value / 1.8) - 32; return ('SUCCESS'); } =back =head1 ERROR ROUTINES These are the catch-all routines. They handle keys that don't have set() or get() functions defined in the lookup tables. In theory, you could use these routines to replace the get_normal() and set_normal() operations, but I really don't like that idea. It makes typos *way* too powerful for my taste. I prefer to define my interfaces explicitly. =over 4 =cut sub get_error { my ($O, $key) = @_; return ( "FAILURE - " . "I don't know how to locate the value for '$key'." ); } sub set_error { my ($O, $key, $val) = @_; return ( "FAILURE - " . "I don't know how to set '$key' to '$value'." ); }

You can also throw in a table of data validation functions, and write set_normal() like so:

sub set_normal { my ($O, $key, $val) = @_; my $func = $O->{'validate-funcs'}->{ $key }; my $result = $O->$func ($val); if ($result eq 'SUCCESS') { $O->{'data'}->{ $key } = $val; } return ($result); }

You can use a synonym table to make the same values accessible by multiple names:

sub new { {...} $O->{'synonyms'} = { 'rm' => 'REQUEST_METHOD', 'q' => 'QUERY_STRING', {...} }; {...} } sub get { my ($O, @keys) = @_; my $result = {}; for my $k (@keys) { my $real_key = $O->{'synonyms'}->{ $k } || $k; my $func = $O->{'get-funcs'}->{ $real_key } || 'get_error' ; $result->{ $k } = $O->$func ($real_key); } return ($result); }

And you can trick out the dispatch functions with argument-list validation code to enforce rules about data access.. setting things up so you can only set 'v1' if you also set 'v2', for instance.

Overall, I'm happy with the design.

Thing is, I have this semi-irrational prejudice against using runtime evaluation to identify functions.. i.e.: doing this:

$O->$func ($key);

I know that it works. I know that I'm doing it in a reasonably disciplined way. I'm cool with the idea of treating functions as first-order items of data. And I really like the simplicity of the notation. I just have this deeply-ingrained attitude that says program structure should be defined at compile time.

There is a way to get the same result without doing the runtime evaluation: build a family of lightweight Command classes and use those in the lookup tables rather than strings:

package Command; sub new { return (bless {}, shift); } sub execute { } package Get_normal; @ISA = qw( Command ); sub execute { my ($O, $target, $key) = @_; $target->get_normal ($key); return; } package Data_holder; sub new { {...} my $g = new Get_normal; $O->{'get-funcs'} = { 'key-1' => $g, {...} } {...} } sub get { {...} my $cmd = $O->{'get-funcs'}->{ $k } || $O->{'get-funcs'}->{'get_error'} ; $result->{ $k } = $cmd->execute ( $k ); {...} }

The effect is exactly the same, but this time the method invocations are wired directly into the source. The branching ends up getting built into the syntax tree, rather than being determined at runtime. This version could be ported to C++ where the previous version couldn't.

But for cripe's sake, look at all the extra code! And the coupling between the main class and all the command classes is just plain ugly.

I'm trying to convince myself that the first version is okay.. roughly on the order of using macros in Lisp. I know that any language worth using offers certain high-voltage techniques that should be avoided when it's at all convenient to do so, but used with respect when the benefit outweighs the risk.

I think that covers the situation I'm dealing with here. The fact that I do know how to write the package with a C++ accent means it's probably safe to call the first version an idiom. All I have to do is remind myself that runtime evaluation is the short way of saying something I'd have to spell out more thoroughly in C++, and offer a quick prayer of thanks to Larry for saving me all that extra typing whenever I use the technique in the future.

But I'm still not completely convinced. We're talking about symbolic references here, and the short course on that subject is:

Which brings us to the point of this query. Does anyone else have a tidier way of storing method invocation in a lookup table, or is this particular way of using symbolic references basically okay?

Replies are listed 'Best First'.
Re: Need help with a conceptual speed bump
by chromatic (Archbishop) on May 03, 2005 at 22:00 UTC
    We're talking about symbolic references here

    Yes and no.

    Yes, Perl looks up methods in the symbol table when you dispatch to them.

    No, it does this for methods anyway. If you write a small test program:

    use CGI; my $q = CGI->new(); my $new = 'new'; my $z = CGI->$new();

    and dump the optree:

    $ perl -MO=Concise method_dispatch.pl l <@> leave[1 ref] vKP/REFC ->(end) 1 <0> enter ->2 2 <;> nextstate(main 514 method_dispatch.pl:8) v/2 ->3 8 <2> sassign vKS/2 ->9 6 <1> entersub[t2] sKS/TARG,2 ->7 3 <0> pushmark s ->4 4 <$> const(PV "CGI") sM/BARE ->5 5 <$> method_named(PVIV "new") s ->6 7 <0> padsv[$q:514,517] sRM*/LVINTRO ->8 9 <;> nextstate(main 515 method_dispatch.pl:9) v/2 ->a c <2> sassign vKS/2 ->d a <$> const(PV "new") s ->b b <0> padsv[$new:515,517] sRM*/LVINTRO ->c d <;> nextstate(main 516 method_dispatch.pl:10) v/2 ->e k <2> sassign vKS/2 ->l i <1> entersub[t5] sKS/TARG,2 ->j e <0> pushmark s ->f f <$> const(PV "CGI") sM/BARE ->g h <1> method sK/1 ->i g <0> padsv[$new:515,517] s ->h j <0> padsv[$z:516,517] sRM*/LVINTRO ->k method_dispatch.pl syntax OK

    you'll see that the effective difference is in the method_named and method ops. Looking in pp_hot.c, the former calls method_common with the SV containing the name of the method. The latter checks that the child SV is a reference and returns it if so or calls method_common otherwise.

    method_common looks up the method in the appropriate stash and returns the reference.

    Is there a difference? I don't really see one. Besides all this, Perl doesn't (and often can't) resolve the target of method calls statically anyway. (There might be some post hoc ergo propter hoc somewhere in all of this.)

      Hmm.. that's true, but this:

      $n =1; $ref = "n"; $$ref = 2;

      Gives us a dump that looks like so:

      g <@> leave[1 ref] vKP/REFC ->(end) 1 <0> enter ->2 2 <;> nextstate(main 1 sym.pl:1) v ->3 5 <2> sassign vKS/2 ->6 3 <$> const[IV 1] s ->4 - <1> ex-rv2sv sKRM*/1 ->5 4 <#> gvsv[*n] s ->5 6 <;> nextstate(main 1 sym.pl:2) v ->7 9 <2> sassign vKS/2 ->a 7 <$> const[PV "n"] s ->8 - <1> ex-rv2sv sKRM*/1 ->9 8 <#> gvsv[*ref] s ->9 a <;> nextstate(main 1 sym.pl:3) v ->b f <2> sassign vKS/2 ->g b <$> const[IV 2] s ->c e <1> rv2sv sKRM*/1 ->f d <1> rv2sv sKM/DREFSV,1 ->e c <#> gv[*ref] s ->d

      The difference doesn't appear to be large there, either. This time the switch is between ex-rv2sv / rv2sv and gvsv / gv. I don't have a copy of the perl source at hand, but I'd imagine the behavior is much the same as in method and method_named. The problems are similar, after all.

      The point is, that little bit of indirection occurs in a place where things can go blooey with frightening ease. Imagine code that took the method name as a function parameter instead of looking it up in a statically-defined list. Yucko. Granted, Perl won't auto-vivify functions or clobber a function definition because of a typo, so this is less dangerous than using symrefs for lvalues, but the general mechanism can still be used to do some horribly ugly things.

      Still, you've tipped the scale a bit further in the "less dangerous than flossing with a loaded shotgun" direction, so thanks for your time.

        Good point about global variables. I do think that the difference being that functions aren't writeable (at least as you're using them here) is the important point though.

Re: Need help with a conceptual speed bump
by tilly (Archbishop) on May 03, 2005 at 23:04 UTC
    Lose the ingrained attitude.

    The advice about symbolic refs applies to people who are using them to store an unknown amount of arbitrary data and who don't realize that they may accidentally create clashes between different parts of their code, and that they might wipe out something important.

    You're using them for meta-programming, which is exactly what they are good for.

    Incidental question. Why are you returning the string 'SUCCESS' rather than just using true/false normally? (Using true/false would seem like less code, should run faster, and reduces opportunities for typing something like 'SUCCSES' to make for a runtime logic error that might not be easy to spot.)

      Incidental question. Why are you returning the string 'SUCCESS' rather than just using true/false normally?

      Readability in the sample code. In real life, I'd probably return undef for success, an arbitrary error description string for failure, and use if ($result->{$key}) to scan for errors.

      I do prefer human-readable error messages over simple booleans, though. They give me better feedback, and more ways to track down errors while debugging.

      Your brick upside the head is appreciated, though. That's two people whose opinions I respect not telling me that I've gone completely wacko. Thanks.

        My inclination for something this unlikely to go wrong, and where going wrong is a real error, is to return the set value and throw an exception if there is a problem.

        Another common strategy is to return true/false to be interpreted normally and then make the error available elsewhere. For instance DBI does this, with the error in the errstr method.

        A cute idea that I just had is that you could return a true value for success, and an overloaded false string for failure.