nysus has asked for the wisdom of the Perl Monks concerning the following question:

Here are a couple of rudimentary classes for creating a File object which contains a reference to a Dir object which in turn contains the File object:

package Practice::File; use v5.38; use Cwd; use Scalar::Util qw(weaken); sub new { my $class = shift; my $self = {}; $self->{name} = shift; bless $self, $class; return $self; } sub dir { my $self = shift; return $self->{dir}->{dir}; } sub add_dir { my $self = shift; my $dir = shift; $self->{dir} = $dir; weaken($self->{dir}); # <=============== Is not using weaken h +ere a really bad idea? $self->{dir}->add_file($self); } sub path { my $self = shift; my $path = $self->{name}; if (defined $self->{dir}) { $path = $self->{dir}->path . '/' . $path; } return $path; } 1; package Practice::Dir; use v5.38; use Cwd; sub new { my $class = shift; my $self = {}; $self->{dir} = cwd(); $self->{files} = []; bless $self, $class; return $self; } sub add_file { my $self = shift; my $file = shift; push @{$self->{files}}, $file; } 1;
I threw in a weaken on the File's dir attribute but I'm not really sure if that's needed. My understanding is that weaken does not increment the reference count. So if I delete the Dir object, the files will still reference the Dir and it will therefore remain in existence for as long as any File object references the Dir. But with the weaken keyword, if I delete the Dir, the Files can no longer reference the Dir. But is this really a big deal? I don't really see the harm in leaving it as a strong reference. But maybe I'm wrong and this is considered to be a memory leak. Can someone shed some light on this for me?

$PM = "Perl Monk's";
$MC = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar Parson";
$nysus = $PM . ' ' . $MC;
Click here if you love Perl Monks

  • Comment on Should I use weaken on an object attribute containing a reference to an object which contains reference back to original object?
  • Select or Download Code

Replies are listed 'Best First'.
Re: Should I use weaken on an object attribute containing a reference to an object which contains reference back to original object?
by haj (Vicar) on Jan 21, 2024 at 10:48 UTC

    That are some unfortunate ways to say it.

    • My understanding is that weaken does not increment the reference count: The particular variable you weaken does no longer count as a reference to whatever it is referring to.
    • ...if I delete the Dir object...: How do you do this?

    The purpose of weaken is to assist Perl's garbage collection mechanism. This mechanism returns memory to the pool when there are no more references to it. It is not possible to demonstrate this with just the classes: Let me add some main code. Warning: This code runs forever!

    package main; while (1) { my $dir = Practice::Dir->new; my $file = Practice::File->new; $dir->add_file($file); $file->add_dir($dir); }

    If you comment out your call to weaken and run the code, then you'll see the memory consumption increase slowly, but steadily.

    Explanation: At the end of each iteration, the variables $dir and $file go out of scope. The object behind $dir will not be freed, because it is referenced to by the $file object - and vice versa. If you weaken the dir attribute of the file, then the object behind $dir can be freed, and since this is now gone, the object behind $file can also be freed.

Re: Should I use weaken on an object attribute containing a reference to an object which contains reference back to original object?
by jo37 (Curate) on Jan 21, 2024 at 15:24 UTC
    if I delete the Dir, the Files can no longer reference the Dir

    It's not only a question of deletion. The reference to Dir within File will disappear as soon as the Dir object gets out of scope, which is a Bad Thing™

    use strict; use warnings; use Data::Dump; my $file = Practice::File->new('foo'); { my $dir = Practice::Dir->new('bar'); $file->add_dir($dir); } dd $file; __DATA__ bless({ dir => undef, name => "foo" }, "Practice::File")

    Greetings,
    -jo

    $gryYup$d0ylprbpriprrYpkJl2xyl~rzg??P~5lp2hyl0p$
      That's one of the reasons why I wrote that it depends on design decisions

      For instance, a common concept in filesystems is that deleting a directory is only possible if it's empty. => rmdir

      Another that deleting a directory also deletes the containing files. => rm -r

      In both cases all "File" objects having weak references to the "Dir" should have been removed.

      If that's done automatically in a File:: DESTROY method or only allowed to happen after a $dir->rm call are other design decisions.

      The problem here is that the OP's question is to vague to be answered.

      If this is all hypothetical and intended for Practice:: instead for a real simulation of a file system, I'd say the topic is too ambiguous to discuss weaken

      Especially as I can't even tell that I know all subtle differences between different file systems. Hence there is no "natural" model to copy.

      Update

      At least, I don't know what a "directory getting deleted after going out of scope" would translate to in "normal practice" ...(?)

      Personally I wouldn't allow it to happen.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      see Wikisyntax for the Monastery

Re: Should I use weaken on an object attribute containing a reference to an object which contains reference back to original object?
by NERDVANA (Priest) on Jan 23, 2024 at 19:29 UTC
    I like to think of it differently than most of the replies so far.

    In perl, the designer of an API must consider which are the "root objects" that they want a user to hold onto, and all objects should have a tree of ownership from those root objects that determines when they get freed. If one of your internal "leaf" objects needs to refer back to a root object, use a weakened reference.

    Consider the actual filesystem: In Unix, each file and directory have a "link count". A directory has one reference to itself ("."), one reference to the parent (".."), and one reference to each file or subdirectory. Stated the other way around, the references to a directory are one from the parent, one from itself, and one from each subdirectory. A file (inode) has one link from each directory it is listed in, but no references outward. You cannot determine the directory of a file from just the file inode. In Unix, the directory tree is global, and a directory may not be removed from the tree unless it is empty. This is because Unix doesn't have weak-references and the only way it can know that all the subdirectories got cleaned up is if you first empty each one of them and remove it from the parent.

    Now consider how things could work if Unix did have weak-references. Each directory could use a weak-reference for ".." and ".", and so the subdirectories would not be adding to the link count for a directory. The directory would have a link count of exactly 1 for as long as it was referenced by the parent. You would be able to "rm" ("unlink") a directory and then let the filesystem driver delete it in the background asynchronously. Each time the directory got deleted, the filesystem driver would reduce the link count of all the files and directories under it, and then continue to free those if their count dropped to 0. If the system rebooted, the filesystem driver would be able to pick up where it left off by checking the link counts of remaining directories.

    The first case works because the kernel can enforce the policy of making sure a directory is empty before allowing it to be removed from the tree, and because the root of the tree is global (the reference won't get lost by going out of scope of some function). In a perl program, you could mimic this with an API that enforces directories be empty before removing them from the tree, and have the root directory referenced from a package global. This would work and not leak memory and you would get exactly the same semantics as the real filesystem. You could also opt to let files refer to their directory, but no longer list them in more than one directory.

    If you didn't want your perl-object-filesystem to be global, you need something more like the second design. You need a way to know when *all* application references to the tree have been dropped so that you can clean up the entire tree. There are actually several ways to do it. The simplest is to make each "forward" reference from directory to its content "strong", and each "backward" reference from content to containing directory "weak". If you have a reference to a directory and drop the reference to the tree leading up to it, the directory suddenly becomes the root of the remaining tree. A second way is to have the whole tree be strongly-linked in both directions, but then don't let the user hold onto the internal objects. Instead, give the user "proxy" objects that hold a strong reference to parts of the tree, and manage your own reference count stored somewhere in the tree. When the last proxy object is garbage collected, your DESTROY method decreases the manual reference count on the tree and sees that it was the last proxy object, then walks the entire tree breaking all the references. This would be sort of like lazy-unmounting a filesystem when the last open handle was closed.

    Back to your original question, if you just ignore the whole topic and create circular strong references without a plan to clean it up, then every time you make temporary usage of your API you will reserve some more memory and never get it back until the program exits. That is bad for long-lived processes like web servers, and even sometimes for batch processing if you need a bunch of temporary calculations. Other batch processing will be fine; it creates objects for the input data, and then the whole process exits freeing everything anyway.

Re: Should I use weaken on an object attribute containing a reference to an object which contains reference back to original object? (design decisions)
by LanX (Saint) on Jan 21, 2024 at 14:10 UTC
    This depends on your design decisions!

    Looks like you have a bidirectional membership relation, implemented with references (which isn't the only way to do it, see below)

    If I were you I'd first think hard about the membership semantics you want and consequently how deletion is supposed to work.

    What are your desired N to M relations ?

    • can files "belong" to more or less than one directory?
    • can directories belong to other directories ? Even more than one?
    • are directories allowed to be empty?

    And after that deletion semantics

    • do you want explicit and/or implicit (ref count) deletion of objects?
    • what should happen to the "members" after a directory is "deleted" or destroyed
    • what should happen to the "containers" after a directory or file is "deleted" or destroyed

    Most of us have more or less similar answers based on experience with filesystem operations, but yours might be different. But FS differ and some have inodes, symlinks, hardlinks, etc...

    Furthermore: as indicated, you don't necessarily need to use (circular) references to design this relation. Like with symbols for references

    So you could also have class variables'%instance' collecting objects for each class (i.e. all files, all directories) with identifiers as keys and object references as values. (A possible natural unique ID is just the stringification of the reference plus timestamp)

    Just store the related IDs now in the objects

    Of course you'll have to design the necessary delete or DESTROY methods now on your own, to clean up broken relations after an object is deleted.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    see Wikisyntax for the Monastery

      Thanks. But what do you mean by "symbols for references?" Are you referring to something like this: Symbolic references?

      $PM = "Perl Monk's";
      $MC = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar Parson";
      $nysus = $PM . ' ' . $MC;
      Click here if you love Perl Monks

        A unique textual identifier for each object. A name following rules and closely coupled to an object.

        To get the actual reference, you'll call a class method like My::File->get_obj(symbol) to lookup in %My::File::instance

        That's the "weakest" way...

        Update

        This allows to designs systems where warnings like "object (File/Directory) doesn't exist anymore" are possible. Think inodes!

        Also keep in mind that Perl can reuse refs for new variables after destructions, tho I doubt this happens when there is still a weakened ref around. (?)

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        see Wikisyntax for the Monastery

Re: Should I use weaken on an object attribute containing a reference to an object which contains reference back to original object?
by ikegami (Patriarch) on Jan 22, 2024 at 02:31 UTC

    The problem with not using weaken has been explained.

    But there's also a problem with using weaken.

    sub d { my $dir = Practice::Dir->new; my $file = Practice::File->new; $dir->add_file($file); $file->add_dir($dir); return $dir; } sub f { my $dir = Practice::Dir->new; my $file = Practice::File->new; $dir->add_file($file); $file->add_dir($dir); return $file; } # This works fine. # `$dir` is a dir with one file. my $dir = d(); # XXX This fails. # `$file`'s dir is `undef` even though it wasn't before we returned. my $file = f();

    The correct solution could be neither.

      > The correct solution could be neither.

      As I already said, this is a design problem of this "file system simulation" , and not weaken's

      I have trouble imagining a file system where dirs and files are deleted implicitly after "falling out of scope".

      Personally I would make deletion explicit with ->remove methods, and keep strong refs of the objects only in an %instance hash of the class, all other internal refs must be weak. Like this the refcount can not become 0 unless explicitly wanted.

      Now if there are still other external refs when ->remove is called, this must be resolved.

      Either they are all weak, because the constructor returned them weak and they'll become automatically undef after distruction.

      Or they are strong and the refcount is checked by ->remove and throws an error "can't remove, still in use"

      Bottom line: This scenario is too complicated to discuss weaken

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      see Wikisyntax for the Monastery

        I have trouble imagining a file system where dirs and files are deleted implicitly after "falling out of scope".

        I've re-read the OP. It doesn't seem to mention anything about actually operating on a filesystem. Rather it's just an example of collections of related objects or references.


        🦛

        What the hell are you talking about? I didn't say anything about automatic file deletion. The question being asked is if the OP should be using weaken or not in a data structure that mirrors a file system.

        haj pointed out that you get memory leaks if you comment out weaken. And I pointed out that you can end up with premature deallocation if you don't. The point I was making is that just adding weaken is not a solution, or at least not one without downsides.

        Perl's reference-counting GC makes bidirectional data structures very hard to implement.

Re: Should I use weaken on an object attribute containing a reference to an object which contains reference back to original object?
by jo37 (Curate) on Jan 21, 2024 at 08:54 UTC

    Another approach would be the usage of a DESTROY method to get rid of the circular reference.

    Update: Nonsense. The destructor would never be called.

    Greetings,
    -jo

    $gryYup$d0ylprbpriprrYpkJl2xyl~rzg??P~5lp2hyl0p$