nothingmuch has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

This question is really about perl internals, with a bit of an exposition so that you guys can say "no stupid, there's a much more obvious solution".

I'm implementing Tie::RefHash::Weak (don't click the link, it's not there yet). It's pretty simple. Just weakens the value of $s->[0]{$k}[0], which is the key-that-is-a-reference, and also makes $s->[0]{$k}[1] store a weak reference to the value, instead of the value itself, so that we don't interfere with the destruction of stored values. This way, when other copies of the reference used as a key are gone, the weak copies of the references are set to undef. The reference to the value is also weak, because we pretend the whole key-value-pair is weak. It's arguable whether that's the "correct" behavior.

This is simple to do: you just modify all the accessors so that they also check for undefness of the copy of the reference, and pretend it's not there when it became undefined.

This has a problem though. If we fill the hash with references, and slowly objects start going away, then we get a hash filled with duds.

There are three ways to deal with this. The first is to ignore the problem. We clean up lazily inside NEXTKEY, and all is well. Sort of. No big O complexity is added to the hash operations, but a constant overhead, though negligable is added to NEXTKEY. However, between iterating the hash with each, or calling keys or values on it, the hash is filled with dead bloat.

The second way is to iterate the whole hash and purge it before every operation. This ensures that none of this leaking will occur, but is very expensive as the hash grows, because in effect the hash operations become O(N) + whatever they were before. Even worse, something like values is in theory much slower, because it calls NEXTKEY and FETCH, so it's O(N*(2N)) instead of O(N). NEXTKEY could be kept lazy like before, I guess. Either way, a hash with more than several hundred values will start becoming a computational bottleneck, for the sake of conserving several undefs and references to the undefs. This is definately not worth it.

The third way is hypothetical, at least with the knowlege I have. Perl sets the references to undef. It does this with the magic_killbackrefs function, somehow. The problem is that I know nothing of perl internals, and now is not a good time to learn.

In short, can magic_killbackrefs be used somehow, to that I can call the tie's DELETE method when the undefs are written to the weak references?

There is a f fourth, bonus solution. We bless whatever the reference points to, and use DESTROY. But this ruins the data the user stores in the RefHash. OTOH, since we have a reference to the value that will be destroyed, perhaps it's destruction can be hooked instead of the backref killing, without making it into an object, or changing it's DESTROY method.

Anyway, what do you think? I'm happy with the leak, because the hashes should be regularly (but not oftenly) iterated, and the data set is small.

If I want to release this to the CPAN though, I think it's half baked at the very best.

-nuffin
zz zZ Z Z #!perl

Replies are listed 'Best First'.
Re: Catching the death of a weak reference
by ysth (Canon) on Oct 28, 2004 at 14:49 UTC
    You could presumably put your own routine in place of magic_killbackrefs via XS; you'd need to do something like
    int handle_killbackrefs(pTHX_ SV *sv, MAGIC *mg) { /* loop stolen from magic_killbackrefs, with * whatever you need to do to for each svp[i]. */ /* afterward, */ return magic_killbackrefs(sv, mg); } extern MGVTBL PL_vtbl_backref; PL_vtbl_backref->svt_free = MEMBER_TO_FPTR(handle_killbackrefs);
Re: Catching the death of a weak reference
by perrin (Chancellor) on Oct 28, 2004 at 15:48 UTC
    I used a hash containing weak refs in Class::DBI and dealt with it by having an access counter and cleaning up after every n accesses. That was Tim Bunce's idea. Things like DESTROY are probably a bad idea if you're dealing with code outside of your control.
      Yeah, that's a good compromise, I guess...

      But fundamentally, one could argue that O(N/100) is still really O(N)...

      My fear is in delivering a module to the CPAN, which is not complete (and hence you could say buggy) by design, algorithmically, although it doesn't necessarily have to be, technical difficulties aside.

      For my data sets, which die rather quickly, I doubt 1000 accesses could be made before the hash is cleaned up anyway. For someone else, however, the hash may be seldom accessed, but large, and with many disappearing references as keys. Both of us don't gain much from such a solution.

      I agree it is better than just leaving it that way, in that it mostly works most of the time. I think that for my purposes, though, ysth's solution seems like the most promising way to feel good about it. I reckon eventually I'll write several subclasses, that are interchangable, and chosen at use time.

      Anyway, thanks... and I wondered how CDBI did it, but was too busy to look (with other stuff) since the problem came up... ;-)

      -nuffin
      zz zZ Z Z #!perl
        You could make the number of accesses between cleanings proportional to the size of the hash. That maintains O(1) average overhead for keeping things clean, divided unfairly. Perl internally uses many variations on this strategy.

        There are scenarios in which this compromise is the wrong thing to do. My answer to that is that no solution ever strikes a perfect balance. Find a good enough one and move on. Make it pluggable (as you suggest) if that isn't good enough.

        In Class::DBI, we make the number of accesses configurable by the user, so that someone who doesn't care can skip the cleanup and someone who is really paranoid about it can do it frequently.
Re: Catching the death of a weak reference
by bibliophile (Prior) on Oct 28, 2004 at 14:00 UTC
    Heh - I'd vote for Lazy - my wife says it's a character flaw :-)

    In the interests of getting it on the CPAN, I'll paraphrase my old comp-sci prof:

    1. Get it working.
    2. Get it working right.
    3. Get it working fast.

    Just my $0.03CDN

      I currently have it working.

      As for working right - that's arguable. From the outside it appears to be working right - all the interfaces are implemented correctly. But it does have this "leakish" behavior.

      My problem is that the only way i know to getting it working right on the inside too, is so contradictory to point 3, it might be contradictory point 1 as well.

      What I need to know is whether there is a way to get it working right and fast, simultaneously, using knowlege I cannot afford to learn right now, because the overhead of learning it is too high, for a probable no.

      Provided I have a good direction, and a positive or at least optimistic answer I will consider it a good excuse to finally learning to muck with perl internals.

      Lastly, is being lazy, or voting for lazy the character flaw?

      Ciao!

      -nuffin
      zz zZ Z Z #!perl
Re: Catching the death of a weak reference
by Anonymous Monk on Oct 29, 2004 at 19:24 UTC
    Did you ever bless something to "\0somepackage"? From the outlook it's not a ref anymore. But I couldn't find out if it would call "\0::DESTROY" or "\0somepackage::DESTROY" because I couldn't define both.
Re: Catching the death of a weak reference
by nothingmuch (Priest) on Dec 17, 2004 at 10:46 UTC
    If anyone is interested, Tie::RefHash::Weak has been uploaded.

    Enjoy!

    -nuffin
    zz zZ Z Z #!perl