Need help with a conceptual speed bump

mstone has asked for the wisdom of the Perl Monks concerning the following question:

Hey all -

I've been kicking around some OO design ideas, and have come up with a data storage class well suited to being the Model in a Model-View-Control architecture. It uses an embedded data hash and a table of functions to reduce the whole access interface to the functions set() and get().

Thing is, I'm not completely happy with the lookup tables.

The sample class looks like so:

    package  Data_holder;
    
    =head   CONSTRUCTOR
    
        Nothing terribly exciting here.  All we do is bless a
        hashref, embed a data hash, and store two hashes that map
        attribute names to the names of the methods that will handle
        them.
    
    =cut
    
    sub new {
        my $O = bless {}, shift;
        
        $O->{'data'} = {};
    
        $O->{'get-funcs'} = {
            'key-1' =>  'get_normal',
            'key-2' =>  'get_synthetic',
        };
        $O->{'set-funcs'} = {
            'key-1' =>  'set_normal',
            'key-2' =>  'set_synthetic',
        };
        
        return ($O);
    }
    
    =head1  DISPATCH FUNCTIONS
    
    =over 4
    
    =item   get (list) : hash
    
        This is the inspection routine.  It takes a list of
        attribute names as input, and returns a hash of named
        values as output.  It does its job by calling functions
        identified in {'get-funcs'}.
    
    =cut
    
    sub get {
        my ($O, @keys) = @_;
        
        my $result = {};
        
        for my $k (@keys) {
            my $func = $O->{'get-funcs'}->{ $k } || 'get_error';
            $result->{ $k } = $O->$func ( $k );
        }
        return ($result);
    }
    
    
    =item   set (hash) : hash
    
        This is the storage function.  It takes a hash of named
        values as input, and returns a hash of result codes as
        output.  It does its job by calling functions identified
        in {'set-funcs'}.
    
    =cut
    
    sub set {
        my ($O, $args) = @_;
        
        my $result = {};
        
        for my $k (keys %$args) {
            my $func = $O->{'set-funcs'}->{ $k } || 'set_error';
            $result->{ $k } = $O->$func ( $k, $args->{ $k } );
        }
        return $result;
    }
    
    =back
    
    =head1  INSPECTION METHODS
    
        These are the functions that do the actual work of getting
        values.  They're called by get() when it finds their names 
        in {'get-funcs'}.
    
    =over 4
    
    =item   get_normal (key) : value
    
        This routine simply pulls the named value from the
        emedded hash.
    
    =cut
    
    sub get_normal {
        my ($O, $key) = @_;
        return ($O->{'data'}->{ $key });
    }
    
    
    =item   get_synthetic (key) : value
    
        This routine is a little more interesting.  It
        calculates the return value from some other value or
        combination of values in the embedded data hash.  In
        this case, we'll calculate the farenheit value of a
        stored centigrade temperature.
        
        This is also just an example.  In a real version of this
        class, you'd have a different function for every value
        you wanted to calculate.
    
    =cut
    
    sub get_synthetic {
        my ($O, $key) = @_;
        
        my $result = 32 + (1.8 * $O->{'data'}->{'temperature'});
    
        return ($result);
    }
    
    =back
    
    =head1  STORAGE METHODS
    
        These are the functions that do the actual work of setting
        values.  They're called by set() when it finds their names
        in {'set-funcs'}.
    
    =over 4
    
    =item   set_normal (key, value) : result
    
        This is the basic storage routine.  It drops the value
        into the embedded data hash under the assigned key.  It
        also returns a success code since there's no way that
        operation can fail.
    
    =cut
    
    sub set_normal {
        my ($O, $key, $val) = @_;
        
        $O->{'data'}->{ $key } = $val;
    
        return ('SUCCESS');
    }
    
    
    =item   set_synthetic (key, value) : result
    
        Like get_synthetic(), this is a sample routine.  Here we
        store the centigrade value of a temperature entered in
        farenheit.
    
    =cut
    
    sub set_synthetic {
        my ($O, $key, $value) = @_;
        
        $O->{'data'}->{'temperature'} = ($value / 1.8) - 32;
        return ('SUCCESS');
    }
    
    =back
    
    =head1  ERROR ROUTINES
    
        These are the catch-all routines.  They handle keys that
        don't have set() or get() functions defined in the
        lookup tables.
        
        In theory, you could use these routines to replace the
        get_normal() and set_normal() operations, but I really don't
        like that idea.  It makes typos *way* too powerful for my
        taste.  I prefer to define my interfaces explicitly.
    
    =over 4
    
    =cut
    
    sub get_error {
        my ($O, $key) = @_;
        return (
            "FAILURE - "
            .   "I don't know how to locate the value for '$key'."
        );
    }
    
    sub set_error {
        my ($O, $key, $val) = @_;
        return (
            "FAILURE - "
            .   "I don't know how to set '$key' to '$value'."
        );
    }
[download]

You can also throw in a table of data validation functions, and write set_normal() like so:

    sub set_normal {
        my ($O, $key, $val) = @_;
        
        my $func = $O->{'validate-funcs'}->{ $key };
        my $result = $O->$func ($val);
        
        if ($result eq 'SUCCESS') {
            $O->{'data'}->{ $key } = $val;
        }
        return ($result);
    }
[download]

You can use a synonym table to make the same values accessible by multiple names:

    sub new {
        {...}
        
        $O->{'synonyms'} = {
            'rm'    =>  'REQUEST_METHOD',
            'q'     =>  'QUERY_STRING',
            {...}
        };
        
        {...}
    }
    
    sub get {
        my ($O, @keys) = @_;
        
        my $result = {};
        
        for my $k (@keys) {
            my $real_key = $O->{'synonyms'}->{ $k } || $k;
            my $func = $O->{'get-funcs'}->{ $real_key } 
                || 'get_error'
            ;
            $result->{ $k } = $O->$func ($real_key);
        }
        return ($result);
    }
[download]

And you can trick out the dispatch functions with argument-list validation code to enforce rules about data access.. setting things up so you can only set 'v1' if you also set 'v2', for instance.

Overall, I'm happy with the design.

It provides a way of setting and getting of attributes in bulk.
From the client's perspective, the coupling is about as loose as you can get and still be in the same program.
The values that go in and out through the set/get interface are well isolated from their stored representations.
The function lookups enforce a well-defined set of accessible values, but the error routines make the enforcement reasonably polite.
The accessible values are defined in a single place, which makes the list easy to locate and read.
Combining the access functions with validation routines lets you assign programmer-defined types to your data, and provides hooks for the code that enforces your assertions about those types.
On top of all that, the basic framework is lightweight, and fairly easy to read.

Thing is, I have this semi-irrational prejudice against using runtime evaluation to identify functions.. i.e.: doing this:

    $O->$func ($key);
[download]

I know that it works. I know that I'm doing it in a reasonably disciplined way. I'm cool with the idea of treating functions as first-order items of data. And I really like the simplicity of the notation. I just have this deeply-ingrained attitude that says program structure should be defined at compile time.

There is a way to get the same result without doing the runtime evaluation: build a family of lightweight Command classes and use those in the lookup tables rather than strings:

    package Command;
    
    sub new {
        return (bless {}, shift);
    }
    
    sub execute {
    }
    
    package Get_normal;
    @ISA = qw( Command );
    
    sub execute {
        my ($O, $target, $key) = @_;
        $target->get_normal ($key);
        return;
    }
    
    package Data_holder;
    
    sub new {
        {...}
        
        my $g = new Get_normal;
        
        $O->{'get-funcs'} = {
            'key-1' =>  $g,
            {...}
        }
        {...}
    }
    
    sub get {
        {...}
        
        my $cmd = $O->{'get-funcs'}->{ $k } 
            || $O->{'get-funcs'}->{'get_error'}
        ;
        $result->{ $k } = $cmd->execute ( $k );
        
        {...}
    }
[download]

The effect is exactly the same, but this time the method invocations are wired directly into the source. The branching ends up getting built into the syntax tree, rather than being determined at runtime. This version could be ported to C++ where the previous version couldn't.

But for cripe's sake, look at all the extra code! And the coupling between the main class and all the command classes is just plain ugly.

I'm trying to convince myself that the first version is okay.. roughly on the order of using macros in Lisp. I know that any language worth using offers certain high-voltage techniques that should be avoided when it's at all convenient to do so, but used with respect when the benefit outweighs the risk.

I think that covers the situation I'm dealing with here. The fact that I do know how to write the package with a C++ accent means it's probably safe to call the first version an idiom. All I have to do is remind myself that runtime evaluation is the short way of saying something I'd have to spell out more thoroughly in C++, and offer a quick prayer of thanks to Larry for saving me all that extra typing whenever I use the technique in the future.

But I'm still not completely convinced. We're talking about symbolic references here, and the short course on that subject is:

Beginner level: Don't.
Advanced level: Don't.. until you understand why. And then don't.
Really advanced level: If you understood why, still didn't, and couldn't come up with anything better, consider the possibility. But run the idea past someone else, first. You might be missing something.

Which brings us to the point of this query. Does anyone else have a tidier way of storing method invocation in a lookup table, or is this particular way of using symbolic references basically okay?

Comment on Need help with a conceptual speed bump Select or Download Code

Replies are listed 'Best First'.
Re: Need help with a conceptual speed bump by chromatic (Archbishop) on May 03, 2005 at 22:00 UTC
We're talking about symbolic references here Yes and no. Yes, Perl looks up methods in the symbol table when you dispatch to them. No, it does this for methods anyway. If you write a small test program: `use CGI; my $q = CGI->new(); my $new = 'new'; my $z = CGI->$new();` [download] and dump the optree: $ perl -MO=Concise method_dispatch.pl l <@> leave[1 ref] vKP/REFC ->(end) 1 <0> enter ->2 2 <;> nextstate(main 514 method_dispatch.pl:8) v/2 ->3 8 <2> sassign vKS/2 ->9 6 <1> entersub[t2] sKS/TARG,2 ->7 3 <0> pushmark s ->4 4 <$> const(PV "CGI") sM/BARE ->5 5 <$> method_named(PVIV "new") s ->6 7 <0> padsv[$q:514,517] sRM/LVINTRO ->8 9 <;> nextstate(main 515 method_dispatch.pl:9) v/2 ->a c <2> sassign vKS/2 ->d a <$> const(PV "new") s ->b b <0> padsv[$new:515,517] sRM/LVINTRO ->c d <;> nextstate(main 516 method_dispatch.pl:10) v/2 ->e k <2> sassign vKS/2 ->l i <1> entersub[t5] sKS/TARG,2 ->j e <0> pushmark s ->f f <$> const(PV "CGI") sM/BARE ->g h <1> method sK/1 ->i g <0> padsv[$new:515,517] s ->h j <0> padsv[$z:516,517] sRM/LVINTRO ->k method_dispatch.pl syntax OK [download] you'll see that the effective difference is in the `method_named` and `method` ops. Looking in pp_hot.c, the former calls `method_common` with the SV containing the name of the method. The latter checks that the child SV is a reference and returns it if so or calls `method_common` otherwise. `method_common` looks up the method in the appropriate stash and returns the reference. Is there a difference? I don't really see one. Besides all this, Perl doesn't (and often can't) resolve the target of method calls statically anyway. (There might be some post hoc ergo propter hoc* somewhere in all of this.)	[reply] [d/l] [select]
Re^2: Need help with a conceptual speed bump by mstone (Deacon) on May 03, 2005 at 23:22 UTC
Hmm.. that's true, but this: `$n =1; $ref = "n"; $$ref = 2;` [download] Gives us a dump that looks like so: `g <@> leave[1 ref] vKP/REFC ->(end) 1 <0> enter ->2 2 <;> nextstate(main 1 sym.pl:1) v ->3 5 <2> sassign vKS/2 ->6 3 <$> const[IV 1] s ->4 - <1> ex-rv2sv sKRM/1 ->5 4 <#> gvsv[n] s ->5 6 <;> nextstate(main 1 sym.pl:2) v ->7 9 <2> sassign vKS/2 ->a 7 <$> const[PV "n"] s ->8 - <1> ex-rv2sv sKRM/1 ->9 8 <#> gvsv[ref] s ->9 a <;> nextstate(main 1 sym.pl:3) v ->b f <2> sassign vKS/2 ->g b <$> const[IV 2] s ->c e <1> rv2sv sKRM/1 ->f d <1> rv2sv sKM/DREFSV,1 ->e c <#> gv[ref] s ->d` [download] The difference doesn't appear to be large there, either. This time the switch is between `ex-rv2sv` / `rv2sv` and `gvsv` / `gv`. I don't have a copy of the perl source at hand, but I'd imagine the behavior is much the same as in `method` and `method_named`. The problems are similar, after all. The point is, that little bit of indirection occurs in a place where things can go blooey with frightening ease. Imagine code that took the method name as a function parameter instead of looking it up in a statically-defined list. Yucko. Granted, Perl won't auto-vivify functions or clobber a function definition because of a typo, so this is less dangerous than using symrefs for lvalues, but the general mechanism can still be used to do some horribly ugly things. Still, you've tipped the scale a bit further in the "less dangerous than flossing with a loaded shotgun" direction, so thanks for your time.	[reply] [d/l] [select]
Re^3: Need help with a conceptual speed bump by chromatic (Archbishop) on May 04, 2005 at 01:02 UTC
Good point about global variables. I do think that the difference being that functions aren't writeable (at least as you're using them here) is the important point though.	[reply]
Re: Need help with a conceptual speed bump by tilly (Archbishop) on May 03, 2005 at 23:04 UTC
Lose the ingrained attitude. The advice about symbolic refs applies to people who are using them to store an unknown amount of arbitrary data and who don't realize that they may accidentally create clashes between different parts of their code, and that they might wipe out something important. You're using them for meta-programming, which is exactly what they are good for. Incidental question. Why are you returning the string 'SUCCESS' rather than just using true/false normally? (Using true/false would seem like less code, should run faster, and reduces opportunities for typing something like 'SUCCSES' to make for a runtime logic error that might not be easy to spot.)	[reply]
Re^2: Need help with a conceptual speed bump by mstone (Deacon) on May 03, 2005 at 23:33 UTC
Incidental question. Why are you returning the string 'SUCCESS' rather than just using true/false normally? Readability in the sample code. In real life, I'd probably return undef for success, an arbitrary error description string for failure, and use `if ($result->{$key})` to scan for errors. I do prefer human-readable error messages over simple booleans, though. They give me better feedback, and more ways to track down errors while debugging. Your brick upside the head is appreciated, though. That's two people whose opinions I respect not telling me that I've gone completely wacko. Thanks.	[reply]
Re^3: Need help with a conceptual speed bump by tilly (Archbishop) on May 04, 2005 at 00:01 UTC
My inclination for something this unlikely to go wrong, and where going wrong is a real error, is to return the set value and throw an exception if there is a problem. Another common strategy is to return true/false to be interpreted normally and then make the error available elsewhere. For instance DBI does this, with the error in the errstr method. A cute idea that I just had is that you could return a true value for success, and an overloaded false string for failure.	[reply]
Re^4: Need help with a conceptual speed bump by mstone (Deacon) on May 04, 2005 at 00:41 UTC