Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re^2: When every microsecond counts: Parsing subroutine parameters

by jplindstrom (Monsignor)
on May 17, 2008 at 21:58 UTC ( [id://687147]=note: print w/replies, xml ) Need Help??


in reply to Re: When every microsecond counts: Parsing subroutine parameters
in thread When every microsecond counts: Parsing subroutine parameters

Well, I recall reading something about Plucene, the Perl port of Lucene, being very difficult to get performant. After optimization it was uniformly slow because of many method calls. This wasn't really a problem in Java but was a problem in Perl.

(this is what I recall, a quick Google session didn't find me the mail or post I remember reading about this. Plucene developers would obviously know the real story here.)

But it is interesting what we do with this issue. Named parameters is a very common idiom. It is a very good idiom, in that it leads to maintainable code.

So, what can we do to make it perform better? Some special optimization of this case in the perl implementation? Some new syntax to support this idiom? Could it be related to the new named arguments being proposed for perl 5.12?

/J

Replies are listed 'Best First'.
Re^3: When every microsecond counts: Parsing subroutine parameters
by Jenda (Abbot) on May 17, 2008 at 23:08 UTC

    I don't think any optimization would help much and thus will not be implemented. I ran a few benchmarks to see how much of the additional overhead is related to the repeated creation of the hash and thus might be removed by reusing the hash:

    use Benchmark qw(cmpthese); sub with_hash { my ($one, $two) = @{$_[0]}{'one', 'two'}; } sub wo_hash { my ($one, $two) = @{{@_}}{'one', 'two'}; } my %h = (one => undef, two => undef); cmpthese(1000000, { wo_hash => sub { wo_hash(one => 7, two => 9) }, with_hash => sub { with_hash({one => 7, two => 9}) }, with_consthash => sub { with_hash(\%h) }, with_consthash_mod => sub { @h{'one','two'} = (8,1); with_hash(\%h +) }, with_consthash_modd => sub { @h{'one','two'} = (8,1); with_hash(\% +h); @h{'one','two'}=() }, with_consthash_moddL => sub { local @h{'one','two'} = (8,1); with_ +hash(\%h);}, with_consthash_moddRA => sub { @h{'one','two'} = (8,1); my @r=with +_hash(\%h); @h{'one','two'}=(); @r }, with_consthash_moddRS => sub { @h{'one','two'} = (8,1); my $r=with +_hash(\%h); @h{'one','two'}=(); $r }, });
    As you can see I tried to pass a completely constant hash, that looked OK, much better than foo({one => 1, two => 2}) (1002004/s vs 424628/s on my computer with Perl 5.8.8), the problem is that once I modified the values in the hash before the call the gain got much smaller (634921/s vs 424628/s). And the problem was that the values were kept in the hash between invocations ... which doesn't matter for numbers, but would matter for huge strings or for references. So I had to clear the values. undef()ing the whole hash destroyed any gain whatsoever, setting the values to undef took me to just 489237/s vs 424628/s. And that was if the called subroutine did not need to return anything!

    I tried to use local() on the hash slice or assign the return value into a variable, but that just made things worse, in case the function was supposed to return a list, even worse than the normal inline hash version.

    So even if perl created a hash for the subroutine just once, kept it and just modified and removed the values for each call, the speed gain would be fairly small. For a fairly high price both in memory footprint and code complexity.

    The only thing that might really help would be to convert the named parameter calls into positional on compile time. The catch is that it would require that Perl knows, at the time it compiles the call, all named parameters the subroutine/method can take and the order in which they are expected while converted to positional. Which is completely out of question for methods.

    I'm afraid we have to live with the overhead and in the rare case it actually matters, change the subroutine/method to use positional params.

      I'm afraid we have to live with the overhead...

      There is another alternative. Don't use named parameters.

      Why does anyone use named parameters?

      Let's see. How many of these languages do you think use named parameters at the call site?:

      ABC ACSL Ada Alef Algol Algol68 APL AppleScript AutoIt Autolisp Awk BASIC BCPL Befunge BETA BLISS BLooP C C# C* C++ Cecil CFML CHILL Cilk CLAIRE Clean CLU CMS-2 COBOL Common Lisp Concurrent Clean Concurrent Pascal CORAL 66 CorelScript csh CSP cT Curry Dylan Dynace Eiffel Elisp Erlang Escher Esterel Euphoria FLooP FORMAC Forms/3 Forth FORTRAN FP Goedel GPSS Haskell Hope HyperTalk ICI Icon INTERCAL Interlisp J Java JavaScript Jovial Leda LIFE Limbo Lingo Lisp Logo LotusScript Lua Lucid M Magma Mathematica Mawl Mercury Miranda ML Modula 3 Modula-2 MUMPS NESL NIAL Oberon Objective-C Obliq occam OPS5 Orca Oz Pascal PerfectScript Perl PHP Pict Pike Pilot PL/C PL/I Postscript Prolog Python QBasic Quake-C REBOL Reduce Rexx RPG Ruby S Sather Scheme Self SETL sh Simscript SIMULA

      50%? 10%, 5%, 1%, 2?

      Even the much maligned VB Basic programmers seem to be able to write and maintain their code without this crutch. Why do Perl programmers suddenly feel the need for it?

      I think that some time ago, someone found that they could do it. That a combination of Perl's syntax and hashes meant that it was possible. And kinda cute. And for complex constructors with lots of possible parameters, many optional, it makes a certain amount of sense. You mostly don't call heavy constructors in tight loops so there's no great harm in using it. For constructors.

      But for most general purpose subroutines and method calls, the need for named parameters--ie. calls that take so many arguments that naming them is beneficial beyond an aid memoire for the casual tourist to the code--is strongly indicative of something seriously wrong in the design of the API.

      Mostly, it is just as hard to look up the naming and spelling and casing conventions of named parameters when writing the calls, and just as hard to interpret the meaning of those names when reading them.

      For most programmers in most languages, naming the positional arguments (formal parameters) within the sub or method is perfectly clear and effective. And Perl has that ability. And any edicts to force this upon Perl programmers is based on YAJ. (Yet another Justifiction.)


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Let's see. How many of these languages do you think use named parameters at the call site?:
        I can't really make sense of that sentence, but anyway programming is not a democracy, and it's still evolving fast enough that just counting languages won't give you meaningful insights.

        I think that some time ago, someone found that they could do it. That a combination of Perl's syntax and hashes meant that it was possible. And kinda cute. And for complex constructors with lots of possible parameters, many optional, it makes a certain amount of sense. You mostly don't call heavy constructors in tight loops so there's no great harm in using it. For constructors.
        AFAICT the big advantage of named parameters is that you can leave out the parts that default. This is great when you've got loads of options. And yes, most functions calls do not need a lot of options. But named options really do make a lot of sense whenever you've got two or more of them.

        Mostly, it is just as hard to look up the naming and spelling and casing conventions of named parameters when writing the calls,
        So what? Counting commas is no fun either. Also, a good IDE will help a lot there.
        and just as hard to interpret the meaning of those names when reading them.
        That's just bullshit.

        Yeah, there's a much better and cheaper way - don't name them, name the indices into @_ via constant subs, if you need names instead of numbers for sake of code clarity:

        sub FOO () { 0 } sub BAR () { 1 } sub routine { my $bar = $_[BAR]; $bar += munge( $_[FOO] ); }

        But it is crucial for that discussion to identify when it is beneficial to use named parameters, and why. I can think of:

        • frameworks - you write code that gets called, and there's a convention for what each call brings along. POE is a good example
        • looking up a subroutine or method - you want to make use of some subroutines you use seldom, and a quick glance should suffice to know what it needs
        • myriads of options - but mostly you need just a few of them. Tk is a good example for that

        All other reasons seem to be based on gusto. But then, in early perl OO, objects were mostly blessed hashrefs (tutorials and perl pods are full of them), and much unreflected use of named parameters stems from there, I guess.

        --shmem

        _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                      /\_¯/(q    /
        ----------------------------  \__(m.====·.(_("always off the crowd"))."·
        ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
          But for most general purpose subroutines and method calls, the need for named parameters--ie. calls that take so many arguments that naming them is beneficial beyond an aid memoire for the casual tourist to the code--is strongly indicative of something seriously wrong in the design of the API.

        I respectfully disagree.

        Named parameters means I don't have to pass a string of undefs because one particular call doesn't use those parameters. APIs using positional parameters have a way of requiring difficult upgrade path.

        It's also self-documenting -- instead of a list of variables, each variable is named, which can only help the future software forensic expert.

        Many years ago, I wrote a User Interface program in C, and one of the things that I used was lots of parameter passing, knowing enough that global variables were not the answer. Eventually, I had a couple of routines that required a dozen or so parameters, and as the code matured into a lovely congealed mass of spaghetti, I began to dread getting in there to fiddle with calls to that code, precisely because I had to add 'just one more' parameter at the end.

        The alternative could have been to pass in a pointer to a struct, which is more or less a hashref, but I wasn't secure enough in my abilities to do that. Too bad, because it would have been the right thing to do, just as using a hashref is the right thing to do.

        Alex / talexb / Toronto

        "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Re^3: When every microsecond counts: Parsing subroutine parameters
by samtregar (Abbot) on May 18, 2008 at 04:17 UTC
    So, what can we do to make it perform better?

    If by "it" you mean "our programs" I think the answer is simple - make fewer subroutine calls. If your program is making so many tiny do-nothing calls that parameter parsing or even just subroutine overhead is a significant factor then you're just making too many calls.

    It's a fact of life in Perl that method calls cost something (not too much, but not nothing either). That just means you need to make them count!

    -sam

Re^3: When every microsecond counts: Parsing subroutine parameters
by creamygoodness (Curate) on May 18, 2008 at 01:15 UTC
    Here's a write-up of the Plucene method call issue.

    Everything in Lucene is a method, down to outstream.writeByte(). Hash-based method dispatch just isn't fast enough for a straight port.

    --
    Marvin Humphrey
    Rectangular Research ― http://www.rectangular.com
Re^3: When every microsecond counts: Parsing subroutine parameters
by lima1 (Curate) on May 17, 2008 at 22:51 UTC
    ... because of many method calls. This wasn't really a problem in Java but was a problem in Perl.

    Java makes inline expansion optimizations (as C does with the inline keyword or in gcc somehow automatically with the -O3 flag).

Re^3: When every microsecond counts: Parsing subroutine parameters
by Anonymous Monk on May 18, 2008 at 13:30 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://687147]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2024-03-28 14:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found