in reply to CPAN indexes *.pm as "Documentation"?

Neat module; I've read some biochemistry out of interest.

I am working on big object trees (see my scratch pad for an overview if you're curious) and thought I'd comment on a problem you seem to have with memory leaks (according to your CPAN docs).

You consider inside out objects? Those seem cool, but ... new.

I don't store references in objects -- the objects store short Id strings. A central "objOwner" object hands out the objects from the Ids. It looks like this:

my($superObj) = $o->GetObject($o->GetSupId());

I did it like that because I thought it would be easier to test, less error prone and would make it easier to save/restore parts of the object tree (also easier to stub for rpc).

That it helps the GC algorithm I put under "easier to test".

This should be clear? Ask otherwise. (I hope to upload to CPAN in a few weeks. Writes examples and UI now... :-( )

Replies are listed 'Best First'.
Re^2: CPAN indexes *.pm as "Documentation"? (Memory leaks)
by rvosa (Curate) on Oct 01, 2005 at 10:27 UTC
    Hey, thanks!

    I've been experimenting with something that might be what you're saying: the node class would have a number of arrays (e.g. @parent, @next_sister, @previous_sister, @first_daughter, @last_daughter). The objects are blessed references to scalars holding integers.

    Getters and setters would dereference and assign the respective arrays based on the object's integer value:
    sub set_parent { my ( $self, $parent ) = ( $_[0], $_[1] ); $parent[$$self] = $parent; return $self; } sub get_parent { my $self = $_[0]; return $parent[$$self]; }
    When you construct the node, a new, unique integer is generated by the InsideOutFactory, and when the node's DESTROY is called, the integer value is returned back to the InsideOutFactory, to push it into a pool for reuse. The instance data consists of elements of the class arrays. Kinda like this: inside-out objects using arrays?

    (Mind you, the memory leak problem so far hasn't been noticeable in day-to-day usage, but it does show up when I check with Devel::Cycle.)
      Sounds like the biggest diff is that I use 'son' or 'sub' instead of 'daughter' -- shorter. :-)

      I have a method that gives a short unique ID-string. Easier to read (a number might be confused with other data). You probably get a little more speed of using arrays and ints.

      I only store sup and an array with subs in obj and use methods to find brothers. Fewer possible bugs to keep obj updated when deleting/adding/serializing/etc.

      I have Links between objects in different parts of the tree (you don't need that, I'd guess). Would have been better to make objects out of Links but I have a small subapi to e.g. add link, del link and find links to/from obj.

      In total, this was little code and never gave me any problem. If I should redo it I'd probably let the IDs be invisible in the api, just to be cleaner.

        I have a method that gives a short unique ID-string. Easier to read (a number might be confused with other data). You probably get a little more speed of using arrays and ints.
        I believe this is the "classic" InsideOut approach, right?

        Update: I read back in the thread and on your scratchpad. I see what you're doing now :)

        I agree that that is probably slightly less bug prone, though you wouldn't really notice except for the few methods that actually dig into the underlying data directly (as an aside: me sticking to my own API rather than breaking encapsulation has been a godsend while refactoring).

        The main reason I am thinking about the array-index-as-object approach is that this might be the approach people I work with will adopt also as a serialization format to transmit trees (between python, Java, C++, through CORBA, to databases, to XML.... Aaaarg...), so it seems natural to extend that to the underlying data structure. I wonder how sparse and random the arrays become after a few cycles of reclaiming by the InsideOutFactory?
        I only store sup and an array with subs in obj and use methods to find brothers. Fewer possible bugs to keep obj updated when deleting/adding/serializing/etc.
        I notice that I'm doing a lot of calls for sisters/children/parents, and not that many tree modifications - so calculating the relationships for each call seems inefficient.
        I have Links between objects in different parts of the tree (you don't need that, I'd guess). Would have been better to make objects out of Links but I have a small subapi to e.g. add link, del link and find links to/from obj.
        I'm a bit unclear what you mean by links. Currently, I have a $self->{'GENERIC'} = {} field in each node object (with getters and setters), so I can attach additional generic key/value pairs to the objects.

        Anyway, interesting stuff to ruminate on. Do you have your code on CPAN or something?