in reply to Re: Re: Re: optimization - exists should return a reference
in thread optimization - exists should return a reference

Thanks MarkM for pointing me in the right direction here. Now I'm gonna bug you a little further with some off-the-cuff source diving... I hope you don't mind.

The relevant part of pp.c for a hash seems to be the following

if (SvTYPE(hv) == SVt_PVHV) { if (hv_exists_ent(hv, tmpsv, 0)) RETPUSHYES; }

Which leads us to hv.c:hv_exists_ent() as you pointed out. The relevant part (ignoring for the moment magical and %ENV variations) seems to be:

entry = ((HE**)xhv->xhv_array)[hash & (I32) xhv->xhv_max]; for (; entry; entry = HeNEXT(entry)) { if (HeHASH(entry) != hash) /* strings can't be equal */ + continue; if (HeKLEN(entry) != klen) continue; * if (memNE(HeKEY(entry),key,klen)) /* is this it? */ * continue; * return TRUE; }

With the salient bit being the three lines I've *'d. If I'm reading that correctly it saying

If the memory for the KEY for this entry is NotEqual to the key we've been passed, continue looking.

Otherwise we found it so return TRUE.

Now, if we return TRUE the calling code (pp.c:PP(pp_exists)) does a RETPUSHYES.

Unwinding that macro we find

pp.h:#define RETPUSHYES RETURNX(PUSHs(&PL_sv_yes)) pp.h:#define RETURNX(x) return x, PUTBACK, NORMAL pp.h:#define PUSHs(s) (*++sp = (s)) perlapi.h:#define PL_sv_yes (*Perl_Isv_yes_ptr(aTHXo)) pp.h:#define NORMAL PL_op->op_next pp.h:#define PUTBACK PL_stack_sp = sp

Which preprocesses to

return ( *++sp = ( &( *Perl_Isv_yes_ptr( aTHXo ) ) ) ) , PL_stack_sp = sp , PL_op->op_next;

I think that we can ignore the last two lines which are just setting up the stack and returning the next op-code.

The relevant part here is the address of something located as an offset in the aTHXo struct is being pushed onto the stack.

Now given that at the point in hv_exists_ent() where TRUE is returned, we have access to the entry for the hash entity being tested, and that given this we can use the HeVAL macro to obtain the address of the associated value.

#define HeVAL(he)          (he)->hent_val

If instead of returning TRUE from hv_exists_ent(), if we returned HeVAL(entry), and the code in pp.c:PP(pp_exists) became:

if (SvTYPE(hv) == SVt_PVHV) { SV* sv; if (sv = hv_exists_ent(hv, tmpsv, 0)) RETURNX(sv); }

Then the exists function would return a reference to the value as JMD wanted without any great overhead.

I realised I've only described the exists( hash_element ) case here, but without going into it to far, it looks as though similar changes would take care of the array element case as well. I git lost trying to look at the other cases.

I also realise that I am quite likely to have missed something trying to unwind this. If I have, I really like someone to point out what it is:^)

On a somewhat related note: Can anyone tell me why the indentation on the source modules is so... erm... ecclectic?


Examine what is said, not who speaks.

The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

Replies are listed 'Best First'.
Re: Re: Re: Re: Re: optimization - exists should return a reference
by MarkM (Curate) on Jan 15, 2003 at 19:31 UTC

    HeVAL(entry) returns the "SV *" pointer that refers to the scalar value in the hash entry.

    In order to create a Perl-style "scalar reference" to the scalar value, hv_exists_ent() would need to return:

    newRV_inc(HeVAL(he))

    The result of this invocation would be a new RV (Perl-style "scalar reference"). The reference count for the hash entry value would be incremented.

    From the Perl side of things, a scalar needs to be allocated to hold the reference, the reference then needs to be assigned to a scalar to hold it, before checking it for "truthfulness". Then, later, in order to fetch the value, or store a new value into the hash value, the reference needs to be dereferenced. For example:

    my $reference_to_value; if ($reference_to_value = exists $hash{$key}) { print "value is $$reference_to_value\n"; }

    The question is, would all this additional overhead actually be any benefit over the common case:

    if (exists $hash{$key}) { print "value is $hash{$key}\n"; }

    A further consideration is the overhead in the case where the return value is only interpretted as a boolean value, and not later used to dereference to access the hash entry value: Since a reference was created, but not used except as a 'truth' value, it then needs to be destroyed.

    So, as you can see, returning more information, while useful under circumstances where the information is effectively used, can reduce the performance of the case where the information is not actually used. Intuitively, this should be expected, although sometimes it needs to be presented in this light to make the situation visible. :-)

    Hash lookups are very fast. I believe that the 'exists() returns a reference' solution is not optimal overall.

    Far better avenues to consider would include caching the string hash value, or enhancing the optimizer to detect the situation where the exists is followed by a lookup, and the key has not changed, and to instead generate new op-codes for the exist() that would store the value away in the pad somewhere, and the lookup that would fetch the value from the pad. The benefit of the latter model would be that code such as the following could be optimized:

    $hash{$key}{'a'} = 'A'; $hash{$key}{'b'} = 'B'; $hash{$key}{'c'} = 'C';

    The optimizer could detect that the value ($key) was not altered between use, and the same hash entry value could be reused each time.