Wiggins has asked for the wisdom of the Perl Monks concerning the following question:

I know, "don't worry about efficiency, it is Perl". But here is the question anyway. I am processing a structure of STIX(xml based) data, and have an element to handle (SAX callback). I started doing:
my $tname = $properties->{'Name'} my %attributes = %{$properties->{'Attributes'}}; if ($tname eq "stix:Indicator") { }elsif($tname eq "cybox:Observable") { }elsif($tname eq "cybox:Title") { }elsif($tname eq "cybox:Description") { }elsif($tname eq "cybox:Title") { }elsif($tname eq "cybox:Object") { }elsif($tname eq "cybox:Properties") { }elsif($tname eq "URIObj:Value") {

Then I wondered, what about a hash keyed on the 'tname' and containing corresponding CODEREF?
More memory, more complex to set up, but possibility more efficient as the number of tags to be processed grows. What would be the tipping point for string compares?

Thoughts?

It is always better to have seen your target for yourself, rather than depend upon someone else's description.

Replies are listed 'Best First'.
Re: '100 elsif's vs hash of coderef's
by BrowserUk (Patriarch) on Dec 10, 2015 at 19:07 UTC

    To answer the question you asked; the breakpoint for performance comes around the 10 cases mark:

    #! perl -slw use strict; use List::Util qw[ shuffle ]; use Benchmark qw[ cmpthese ]; our %dispatch = ( 'cybox:abcdef' => sub { 1; }, 'cybox:ghijkl' => sub { 1; }, 'cybox:mnopqr' => sub { 1; }, 'cybox:stuvwx' => sub { 1; }, 'cybox:123456' => sub { 1; }, 'cybox:234567' => sub { 1; }, 'cybox:345678' => sub { 1; }, 'cybox:456789' => sub { 1; }, 'cybox:567890' => sub { 1; }, 'cybox:000000' => sub { 1; }, ); our @k = map{ shuffle keys %dispatch } 1 .. 100; cmpthese -1, { a=>q[ for my $tname ( @k ) { $dispatch{ $tname }->(); } ], b=>q[ for my $tname ( @k ) { if( $tname eq 'cybox:abcdef' ) { 1; } elsif( $tname eq 'cybox:ghijkl' ) { 1; } elsif( $tname eq 'cybox:mnopqr' ) { 1; } elsif( $tname eq 'cybox:stuvwx' ) { 1; } elsif( $tname eq 'cybox:123456' ) { 1; } elsif( $tname eq 'cybox:234567' ) { 1; } elsif( $tname eq 'cybox:345678' ) { 1; } elsif( $tname eq 'cybox:456789' ) { 1; } elsif( $tname eq 'cybox:567890' ) { 1; } else { 1; } } ], }; __END__ C:\test>1149900 Rate b a b 1810/s -- -8% a 1967/s 9% -- C:\test>\Perl22\bin\perl.exe 1149900.pl Rate b a b 2370/s -- -18% a 2899/s 22% --

    Though personally, I tend to gravitate to a dispatch hash when I get 4 or 5 cases or more because it makes for cleaner code.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: '100 elsif's vs hash of coderef's
by dsheroh (Monsignor) on Dec 10, 2015 at 15:25 UTC
    I don't have a specific number for where I'd place the tipping point, but, judging by the code sample in your post, I'd say you're well past it.
Re: '100 elsif's vs hash of coderef's
by GrandFather (Saint) on Dec 10, 2015 at 22:22 UTC

    or skip the dispatch hash altogether and instead dispatch by name:

    use strict; use warnings; for my $name (qw(Title Description Object Properties Value Wibble)) { my $code = main->can("do$name"); if (!$code) { print "!!! Missing handler for '$name'\n"; next; } $code->(); } sub doTitle { print "Title handler\n" } sub doDescription { print "Description handler\n" } sub doObject { print "Object handler\n" } sub doProperties { print "Properties handler\n" } sub doValue { print "Value handler\n" }

    Prints:

    Title handler Description handler Object handler Properties handler Value handler !!! Missing handler for 'Wibble'

    Now if you need a new handler all you do is add the sub.

    Premature optimization is the root of all job security

      I think I would put the dispatched functions in a separate namespace:

      c:\@Work\Perl>perl -wMstrict -le "my @args = qw(hi there); ;; for my $do_this (qw(Title Description Object Fooble Properties)) { my $code = Dispatcher->can($do_this); ;; if (!$code) { print qq{Missing Handler for '$do_this'}; next; } ;; $code->(@args); } ;; { package Dispatcher; ;; sub Title { print qq{title handler: @_}; } sub Description { print qq{description handler: @_}; } sub Object { print qq{object handler: @_}; } sub Properties { print qq{properties handler: @_}; } } " title handler: hi there description handler: hi there object handler: hi there Missing Handler for 'Fooble' properties handler: hi there
      It's no longer necessary to munge the function names, and the functions are easily factored out to a separate module (which could, of course, also be done with the original approach).


      Give a man a fish:  <%-{-{-{-<

        For "real" code my handlers are generally member functions so the "factored out" bit has already happened, but the munging is required to avoid accidental or naughty access to other members of the object.

        A sometimes benefit of munging is that several handlers can be provided for the same base name - an action method and a help string method for example, or a parser and an executor.

        Premature optimization is the root of all job security
Re: '100 elsif's vs hash of coderef's
by Mr. Muskrat (Canon) on Dec 10, 2015 at 17:16 UTC

    You could go with a hybrid approach.

    my $tname = $properties->{'Name'} my %attributes = %{$properties->{'Attributes'}}; my %stix; my %cybox = ( Title => sub { ... }, Description => sub { ... }, Object => sub { ... }, Properties => sub { ... }, _DEFAULT_ => sub { ... }, ); my %URIObj; if ($tname =~ /^stix:(\w+)/) { my $component = $1; ... } elsif ($tname =~ /^cybox:(\w+)/) { my $component = $1; $component = '_DEFAULT_' unless exists $cybox{$component}; $cybox{$component}->(); } elsif ($tname =~ /^URIObj:(\w+)/) { my $component = $1; ... }
Re: '100 elsif's vs hash of coderef's
by ww (Archbishop) on Dec 10, 2015 at 16:57 UTC

    " possibility (sic) more efficient" you speculate?

    It's far more efficent for me (in terms of fitness, $ for gasoline, vehicle-wear-and tear) to jog to tonight's PM meeting; unfortunately, it's also in-efficient (because I won't get there till tomorrow morning).

    For what variant of "efficient?" Are you most concerned with runttime; memory limits or something else?


    ++$anecdote ne $data