Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

optimization - exists should return a reference

by John M. Dlugosz (Monsignor)
on Jan 15, 2003 at 05:08 UTC ( [id://227053]=perlmeditation: print w/replies, xml ) Need Help??

The exists function is defined to return truth if the specified element exists, false if it does not.

In a previous discussion, we determined that a statement that uses an element if that element exists is not optimized to do the lookup only once, and in Perl6 "the situation is even worse".

Well, testing to see if an element exists before using it is a common use of exists. So, if the "true" value it returned were a reference to the element, then one could refer to it without looking it up again.

Finding an occurance in the file I'm currently working on,

return $export_def->{'..client_default_version'}{$client} if exists $ +export_def->{'..client_default_version'}{$client};
would become something like
return $$e if my $e= exists $export_def->{'..client_default_version'}{ +$client};
—John

Replies are listed 'Best First'.
Re: optimization - exists should return a reference
by theorbtwo (Prior) on Jan 15, 2003 at 05:29 UTC

    There are two lines of though. One is that you're likely to need the value of the hash key after testing if it exists. The other is that, if you don't need it, but really just wanted to check if it exists, then you're loosing speed, not gaining it. Also, if all you really wanted was a bool, you just created and then destroyed a reference for nothing, which isn't exactly free.

    The real solution would be common subexpression optimization, which is significantly more difficult.

    You should benchmark exists (as it currently exists) vs. getting the value of a hash value, then consider usage patterns, and if it's worth it. Once you've got some numbers, talk to perl5-porters. Even better, once you've got some numbers and a patch...

    You might also want to post to perl6-langauge about this -- since bool and ref context will be differencatable in perl6, you can get the best of both worlds. In fact, it could even return the value, but give "undef-but-true", etc, to give true results when the kv exists, but the value is false.


    Warning: Unless otherwise stated, code is untested. Do not use without understanding. Code is posted in the hopes it is useful, but without warranty. All copyrights are relinquished into the public domain unless otherwise stated. I am not an angel. I am capable of error, and err on a fairly regular basis. If I made a mistake, please let me know (such as by replying to this node).

      The real solution would be common subexpression optimization, which is significantly more difficult.
      Even more than most people think. Not only is it a bit of a pain to do, and somwehat time consuming, but it potentially alters the semantics of the program. If a hash is tied, caching the return from exists, if exists returns a ref, changes a two-tie-hit to a one-tie-hit. While this may not necessarily be a bad thing, it is definitely a change in behaviour.
      My first thought was indeed common subexpression elimination. But Elian pointed out that this can't be done because you don't know at compile time that the hash dereferences don't have side effects (they may be tied or magic).

      So, to not evaluate the dereference twice, it's necessary to not write it twice. That's what led to this idea. I think it's consistant with other things in Perl such as how logical || returns the value rather than meer truth.

      —John

Re: optimization - exists should return a reference
by MarkM (Curate) on Jan 15, 2003 at 05:42 UTC

    Perl 5.x would take a speed hit in the general case, from your suggestion (instead of returning sv_yes, it would need to allocate a reference each time exists() was invoked).

    Perl 6.x, on the other hand, would not, in the general case, take a speed hit, if boolean context were defined to evaluate to the existing behaviour, and non-boolean context, or more specifically, scalar context, would be the new (reference return value) behaviour.

    Another consideration, is that the cost of a hash lookup may be equal, or less than, the cost of creating a reference, assigning the reference, and later dereferencing the reference.

    I'm not sure if your suggestion is a good one. For Perl 6.x, I do not see a problem with it. Another argument for might be that the behaviour of exists would then become consistent with the behaviour of UNIVERSAL::can().

      I'm really unfamiliar with the perl source, and I still get lost in all the macro's, but looking at the code for Perl_avhv_exists_ent() in av.c, it looks like it might be less work to return the reference than to generate the boolean?


      Examine what is said, not who speaks.

      The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

        With regard to the suggestion that exists() return a reference, and not a boolean, BrowserUk writes:

        I'm really unfamiliar with the perl source, and I still get lost in all the macro's, but looking at the code for Perl_avhv_exists_ent() in av.c, it looks like it might be less work to return the reference than to generate the boolean?

        The correct starting place to look for the implementation of Perl op-codes is the pp*.c files. In this case, look in pp.c for a line that reads:

        PP(pp_exists)

        The function that ends up being called to actually perform the hash lookup is hv.c:hv_exists_ent() (the avhv code still exists in Perl 5.8.0, although the use of avhv's has been deprecated).

        The hash itself contains a C "SV *" reference to the value. Perl references, however, are one level above this. A new RV would need to be created using newRV_inc(SV *), and the RV would be returned.

        The return value would then need to be dereferenced to fetch data from it, or store data to it. Unless the value is to be accessed frequently, it is questionable whether the allocation of an RV, and the following dereferencing of the RV, from Perl code, would be any cheaper than just looking up the value in the hash again.

Re: optimization - exists should return a reference
by sauoq (Abbot) on Jan 15, 2003 at 06:49 UTC

    It seemed like a good idea upon first reading it but theorbtwo and MarkM make good points. My second thought was that it might be good to provide another function for that but then I started questioning how worthwhile it would really be. I can't imagine it would increase efficiency by that much and I don't think those cases where it would are overwhelmingly common. My third thought, and the one I'm resting with, is that it might be nice to provide that in a module via XS. So, how soon will you have that done? :-)

    -sauoq
    "My two cents aren't worth a dime.";
    
Re: optimization - exists should return a reference
by shotgunefx (Parson) on Jan 15, 2003 at 06:48 UTC
    You could probably test it yourself in 5.8 by writing your own optimizer using optimizer. Returning a ref when it uses the return value in a unboolean context. It would be a good place to start in determining performance penalties and implementation difficulties.

    -Lee

    "To be civilized is to deny one's nature."
Re: optimization - exists should return a reference
by japhy (Canon) on Jan 15, 2003 at 08:21 UTC
    It's not all that bad an idea. Chances are you want this optimization when the values in a hash ARE references. So the source pseudocode would be:
    if (SvTYPE(hash_value) == SvREF) return hash_value; else return new_SV_ref(hash_value);
    I like it. I could try to write it, if you'd like.

    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

      Good point.

      Sure, try it. Even if limited and not ready for general population, you can show some concrete numbers on the performance differences.

Re: optimization - exists should return a reference
by Aristotle (Chancellor) on Jan 15, 2003 at 08:22 UTC
    I like the idea a lot. Performance, although not to be neglected when thinking about how a feature will affect the average case, is not so much motivation to me as the fact that this provides a clean way to do it Once And Only Once. Right now, there is no way to achieve that for exists tests. For multilevel lookups you can roll up the lookup for most of the path which gets close, but no cigar:
    exists $_->{$client} and return $_->{$client} for $export_def->{'..client_default_version'};
    But there's just no way around the duplicate mention of $client. And just as I was typing this code, I forgot the sigil in the second lookup and only by chance caught it a few seconds before submitting, which demonstrates why Once And Only Once is a good idea. Had I made this mistake in actual code, it would have broken silently and possibly left me to scratch my head for a long time.

    Makeshifts last the longest.

      Ah, but you don't avoid the 'once-and-only-once' with the modified exists either. You'd have to write something like:
      do {my $tmp = $_ -> {$client}; return $$tmp if $tmp;} for $export_def->{'..client_default_ver +sion'};

      But then you have $tmp three times. And you can still leave the sigil off of $client and have strangely behaving code.

      Abigail

        do {my $tmp = $_ -> {$client}; return $$tmp if $tmp;} for $export_def->{'..client_default_version +'};
        I don't see any use of this proposed new exists (or any existance test at all for that matter) there. What are you trying to show? As BrowserUk said what I'd write then would be
        return $$_ if $_ = exists $export_def->{'..client_default_version'}{$client};
        For the purpose of demonstration, you can even avoid assignments altogether:
        { return ${ exists $export_def->{'..client_default_version'}{$client} or last } }

        Yes, if I forget the sigil on $client I'll still get strange results, but having to get it right twice still offers more room for failure, not to mention the fact that if I get the sole instance of it wrong I will get consistent rather than intermittent failure which is far easier to notice and track.

        I stand by my point, this proposition is A Good Thing in terms of promotion of good programming practices.

        Makeshifts last the longest.

        Surely

        exists $_->{$client} and return $_->{$client} for $export_def->{'..client_default_version'};

        Would just become

        return $$_ if $_ = exists $export_def{'..client_default_version'}{$cli +ent};

        Which gives the desired once-and-once-only behaviour?


        Examine what is said, not who speaks.

        The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

Re: optimization - exists should return a reference
by BrowserUk (Patriarch) on Jan 15, 2003 at 05:24 UTC

    Without trying to think that through to the nth-degree, it seems like a really good idea to me. I wonder if the 'right people' are watching here?


    Examine what is said, not who speaks.

    The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

      wonder if the 'right people' are watching here?
      Not with any regularity or reliability, no. Suggestions should go to perl6-language.
Re: optimization - exists should return a reference
by Zaxo (Archbishop) on Jan 15, 2003 at 13:13 UTC

    What if the element doesn't exist?

    In a naive implementation, returning undef, zero, or an empty string from exists will fail dereference, and it would happen at runtime. A reference value that evaluates false, yet returns undef when dereferenced, seems to be required.

    That seems to need a tricky sort of special-casing that would affect all dereferences.

    Update: BrowserUK points out that dereferencing the return of exists yields undef. That does not pass use strict 'refs';, however. exists currently returns the empty string as false, generating Can't use string ("") as a SCALAR ref while "strict refs" in use at -e line 1..

    Update 2: To clarify my point, this idea leads to:

    my %foo; my $bar = ${exists $foo{'bar'}};
    giving assignment to the value without autovivification. The need to provide a reference which is false in bool context (for campatibility) is the basis for my objection.

    After Compline,
    Zaxo

      I don't quite follow you?

      print $$ref if my $ref = exists %hash{something);

      Update Above replaced after Abigail-II's post below.

      print $$_ if $_ = exists $hash{something);

      Regardless of what false value is returned by exists if the thing doesn't exist, the if will be false and so $ref will never be dereferenced.

      Or am I missing something as usual?


      Examine what is said, not who speaks.

      The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

        That won't do what you think it does. Remove the my for it to work.

        Abigail

      So what your saying is that because someone might do

      print ${exists $foo{'bar'};

      instead of

      print $$_ if $_ = exists $foo{'bar'};

      and get a runtime error something like Use of uninitialized value in scalar dereference ...

      you would prevent anyone from benefiting from the advantages of using it properly?

      Given that in the example above, the user has at least acknowledge that $foo{'bar'} may not exist by the very act of using exists, then that's seems analogous to, and a much less easy mistake to make than doing

      print $foo{'bar'};

      instead of

      print $foo{'bar'} if exists $foo{'bar'}

      currently, which also results in a runtime warning.

      Isn't it kind of against the spirit of Perl's we ask you to read the warning signs rather than enforce them with a shotgun approach?


      Examine what is said, not who speaks.

      The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

        Yes, that's exactly what I'm saying.

        If exists is to return a reference, it should always return a reference. Code exists which says exists( $foo{'bar}) ? 'Exists' : 'Does not exist'. That code will break if exists returns a reference, because references always evaluate true.

        If what we get from exists is a reference, we should be able to treat is like any other reference. You seem to want to be able to assign from it and dereference the copy, but deny the ability to dereferencee it directly.

        Just how is that supposed to work?

        After Compline,
        Zaxo

      Sorry, I don't understand. Dereferencing in that way would be a "useless use of exists". To make it not break, you'd simply write: my $bar = ${exists $foo{'bar'} or \undef }; But that is exactly the same as my $bar = $foo{'bar'}; Neither of them vivifies anything. Autovivification takes places only for multilevel deref: my $bar = $foo{'bar'}{'baz'}; Now there will be a bar key in %foo indexing an empty hashref, even if none was there before. A baz key, again, will not be vivified into this new %{$foo{'bar'}} hash. Note that, again, the same will happen if you try this: my $bar = ${exist $foo{'bar'}{'baz'} or \undef}; But the new exists semantic could be used to simplify writing a truly non-vivifying multilevel deref:
      my $qux = \%foo; $qux = exists $qux->{$_} or (undef $qux, last) for qw(bar baz quux qux);

      Makeshifts last the longest.

Re: optimization - exists should return a tweaked SV (was: ... a reference)
by ihb (Deacon) on Jan 15, 2003 at 23:46 UTC
    I'm not to familiar with the internals, but would it be possible to attack the problem from another angle? How about adding an 'deleted' flag to SVs? Then you could do:
    my $val = $hash{foo}; return $val if exists $val;
    I fear the consequences of this though; newbies mixing up exists and defined, people requesting delete to work on scalars too, making newbies even more confused. And I bet there are other things I haven't thought about.

    As of perl5.6 you can use delete on arrays. But if you copy the array and check elements for existence all values will exist:
    my @foo = ('a'..'d'); delete $foo[1]; # delete 'b'. my @bar = @foo; print int exists $foo[1]; # 0 print int exists $bar[1]; # 1
    This is understandable, but not consistent with how the same thing would work for hashes. By not actually removing the element in the array, and instead replace it with the undefined value with the 'deleted' flag set, or upon FETCH:ing returning ditto, you could give the same behaviour to arrays, if that's desirable. I don't see any backwards-compability problems with this.

    But as said, I don't know much about the perl guts, and I haven't actually thought this through.

    Just an idea,
    ihb
Re: optimization - exists should return a reference
by Juerd (Abbot) on Jan 15, 2003 at 22:36 UTC

    return $$e if my $e= exists $export_def->{'..client_default_version'}{$client};

    The lexical $e will not be visible until the next statement, so even if exists returns a reference, it isn't very useful. You'd have to declare $e before the statement, or use a package global instead. Both are ugly, imho.

    perl -wle'print $foo if my $foo = 123'

    Juerd
    - http://juerd.nl/.
    - spamcollector_perlmonks@juerd.nl (do not use).
    

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://227053]
Approved by krujos
Front-paged by mojotoad
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (5)
As of 2024-04-25 16:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found