in reply to optimization - exists should return a reference

Perl 5.x would take a speed hit in the general case, from your suggestion (instead of returning sv_yes, it would need to allocate a reference each time exists() was invoked).

Perl 6.x, on the other hand, would not, in the general case, take a speed hit, if boolean context were defined to evaluate to the existing behaviour, and non-boolean context, or more specifically, scalar context, would be the new (reference return value) behaviour.

Another consideration, is that the cost of a hash lookup may be equal, or less than, the cost of creating a reference, assigning the reference, and later dereferencing the reference.

I'm not sure if your suggestion is a good one. For Perl 6.x, I do not see a problem with it. Another argument for might be that the behaviour of exists would then become consistent with the behaviour of UNIVERSAL::can().

  • Comment on Re: optimization - exists should return a reference

Replies are listed 'Best First'.
Re: Re: optimization - exists should return a reference
by BrowserUk (Patriarch) on Jan 15, 2003 at 07:18 UTC

    I'm really unfamiliar with the perl source, and I still get lost in all the macro's, but looking at the code for Perl_avhv_exists_ent() in av.c, it looks like it might be less work to return the reference than to generate the boolean?


    Examine what is said, not who speaks.

    The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

      With regard to the suggestion that exists() return a reference, and not a boolean, BrowserUk writes:

      I'm really unfamiliar with the perl source, and I still get lost in all the macro's, but looking at the code for Perl_avhv_exists_ent() in av.c, it looks like it might be less work to return the reference than to generate the boolean?

      The correct starting place to look for the implementation of Perl op-codes is the pp*.c files. In this case, look in pp.c for a line that reads:

      PP(pp_exists)

      The function that ends up being called to actually perform the hash lookup is hv.c:hv_exists_ent() (the avhv code still exists in Perl 5.8.0, although the use of avhv's has been deprecated).

      The hash itself contains a C "SV *" reference to the value. Perl references, however, are one level above this. A new RV would need to be created using newRV_inc(SV *), and the RV would be returned.

      The return value would then need to be dereferenced to fetch data from it, or store data to it. Unless the value is to be accessed frequently, it is questionable whether the allocation of an RV, and the following dereferencing of the RV, from Perl code, would be any cheaper than just looking up the value in the hash again.

        Thanks MarkM for pointing me in the right direction here. Now I'm gonna bug you a little further with some off-the-cuff source diving... I hope you don't mind.

        The relevant part of pp.c for a hash seems to be the following

        if (SvTYPE(hv) == SVt_PVHV) { if (hv_exists_ent(hv, tmpsv, 0)) RETPUSHYES; }

        Which leads us to hv.c:hv_exists_ent() as you pointed out. The relevant part (ignoring for the moment magical and %ENV variations) seems to be:

        entry = ((HE**)xhv->xhv_array)[hash & (I32) xhv->xhv_max]; for (; entry; entry = HeNEXT(entry)) { if (HeHASH(entry) != hash) /* strings can't be equal */ + continue; if (HeKLEN(entry) != klen) continue; * if (memNE(HeKEY(entry),key,klen)) /* is this it? */ * continue; * return TRUE; }

        With the salient bit being the three lines I've *'d. If I'm reading that correctly it saying

        If the memory for the KEY for this entry is NotEqual to the key we've been passed, continue looking.

        Otherwise we found it so return TRUE.

        Now, if we return TRUE the calling code (pp.c:PP(pp_exists)) does a RETPUSHYES.

        Unwinding that macro we find

        pp.h:#define RETPUSHYES RETURNX(PUSHs(&PL_sv_yes)) pp.h:#define RETURNX(x) return x, PUTBACK, NORMAL pp.h:#define PUSHs(s) (*++sp = (s)) perlapi.h:#define PL_sv_yes (*Perl_Isv_yes_ptr(aTHXo)) pp.h:#define NORMAL PL_op->op_next pp.h:#define PUTBACK PL_stack_sp = sp

        Which preprocesses to

        return ( *++sp = ( &( *Perl_Isv_yes_ptr( aTHXo ) ) ) ) , PL_stack_sp = sp , PL_op->op_next;

        I think that we can ignore the last two lines which are just setting up the stack and returning the next op-code.

        The relevant part here is the address of something located as an offset in the aTHXo struct is being pushed onto the stack.

        Now given that at the point in hv_exists_ent() where TRUE is returned, we have access to the entry for the hash entity being tested, and that given this we can use the HeVAL macro to obtain the address of the associated value.

        #define HeVAL(he)          (he)->hent_val

        If instead of returning TRUE from hv_exists_ent(), if we returned HeVAL(entry), and the code in pp.c:PP(pp_exists) became:

        if (SvTYPE(hv) == SVt_PVHV) { SV* sv; if (sv = hv_exists_ent(hv, tmpsv, 0)) RETURNX(sv); }

        Then the exists function would return a reference to the value as JMD wanted without any great overhead.

        I realised I've only described the exists( hash_element ) case here, but without going into it to far, it looks as though similar changes would take care of the array element case as well. I git lost trying to look at the other cases.

        I also realise that I am quite likely to have missed something trying to unwind this. If I have, I really like someone to point out what it is:^)

        On a somewhat related note: Can anyone tell me why the indentation on the source modules is so... erm... ecclectic?


        Examine what is said, not who speaks.

        The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.