Circular reference testing.

BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Circular reference testing. by Anonymous Monk on Mar 08, 2005 at 16:54 UTC
Most SV structures are greater than 12 bytes, and I could reduce the overhead commensurately if (for example) there was only ever one SvNULL allocated in the system and all other SV's were multiples of 16 or 20 bytes? But more than one SvNULL will be created, as the following shows: `$ perl -MDevel::Peek -we '$a = undef; $b = undef; Dump($a); Dump($b)' SV = NULL(0x0) at 0x8191dc4 REFCNT = 1 FLAGS = () SV = NULL(0x0) at 0x8191ddc REFCNT = 1 FLAGS = ()` [download] And note that their addresses are 24 bytes apart, which doesn't suggest any alignment on 16 or 20 byte intervals. This: `$ perl -MDevel::Peek -we '$a = 1; $b = 2; Dump ($a); Dump ($b)' SV = IV(0x8192edc) at 0x8191dbc REFCNT = 1 FLAGS = (IOK,pIOK) IV = 1 SV = IV(0x8192ee4) at 0x8191de0 REFCNT = 1 FLAGS = (IOK,pIOK) IV = 2` [download] creates two SVs 36 bytes apart, which suggest you can't use anything larger than 12.	[reply] [d/l] [select]
Re^2: Circular reference testing. by BrowserUk (Patriarch) on Mar 08, 2005 at 17:16 UTC
I think you may have missed the "...if (for example)..." part of my post? But more than one SvNULL will be created That's a shame. Think of the space that would be saved if all SV's that were undef, were pointers to a single, shared copy of undef with a COW flag set? creates two SVs 36 bytes apart, which suggest you can't use anything larger than 12. Yeah! Since posting, I looked at the addresses of consecutive elements of an array and 12 does appear to be the minimum I can rely upon. Any thoughts on discovering the lowest and highest addresses? Examine what is said, not who speaks. Silence betokens consent. Love the truth but pardon error.	[reply]
Re: Circular reference testing. by dave_the_m (Monsignor) on Mar 09, 2005 at 00:21 UTC
SVs are in two parts: the head, which is fixed size and is 12 bytes on a typical 32-bit processor, and which may contain a pointer to the body, which is variable sized depending on the type (integer, string, reference, hash, etc). The address of the head is what is displayed by ref() etc. SV heads are allocated from arenas: malloc'd chunks of memory approx 1K in size. Within arenas, the addresses of SVs will be offset from neighbours by 12 bytes (again, only for a typical 32-bit system), but you can't make any assumptions about SVs allocated from different arenas. Of course this might change completely in a future release of perl. Dave.	[reply] [d/l]
Re^2: Circular reference testing. by BrowserUk (Patriarch) on Mar 09, 2005 at 00:59 UTC
Thanks for the supplying the full SP on SVs :) ...but you can't make any assumptions about SVs allocated from different arenas. Would it be accurate to say that reference values stringified within Perl Are always quad-aligned? Will never be closer than 12 bytes apart? Even if two SV headers are within different arenas, and the arenas are adjacent, the last SV in the lower address arena will still be at least 12 bytes from the end and therefore at least 12 bytes from the first one in the higher addressed arena? If we're on a 64-bit machine, the SV headers are always more than 12 bytes, so using 12 as my divisor is safe? (Do they becomes 16 or 24 bytes on 64-bit systems?) A future release of Perl 5 is unlikely to reduce the size of an SV header? Any thoughts on getting a rough value for the lowest and highest address that can be a reference at any given moment in time? For the lowest, I've been looking through the source trying discover when the various system globals come into being. The thought being that maybe say $^X or $^O would be likely to be allocated pretty early and addresses below that are unlikely to come up in user datastructures, leastwise not as self references? Suggestions for the best choice? For the top end, I've been thinking along the lines that any datastructure I am going to traverse already exists, so if I could force the allocation of a new SV, it's address would form an upper bound for my circular reference testing. The fly in that ointment is that if I simple declare a new variable, I am quite likely to re-use an old one that has gone out of scope. Do you know of any way that I could force the allocation of a completely new SV (header)? I thought about allocating a largish array, big enough that it would force a new allocation from the OS and then use the address of the highest as my upper bound. I also thought that if the largish array was immediately undef'd and I then pre-allocated my bit-vector, it might get to re-use the same space. From what you are saying, and what little I understand about the way Perl allocates memory, the space allocated for a long string is unlikely to re-use space allocated for a large array because they would be allocated from different pools? Examine what is said, not who speaks. Silence betokens consent. Love the truth but pardon error.	[reply]
Re^3: Circular reference testing. by Anonymous Monk on Mar 09, 2005 at 09:32 UTC
If we're on a 64-bit machine, the SV headers are always more than 12 bytes, so using 12 as my divisor is safe? (Do they becomes 16 or 24 bytes on 64-bit systems?) SVs (and their friends AVs, HVs, etc) are defined in `sv.h`. They look like: `struct STRUCT_SV { /* struct sv { / void sv_any; /* pointer to something / U32 sv_refcnt; / how many references to us / U32 sv_flags; / what we are */ };` [download] (AVs, HVs, etc, look the same, except they have a differently typed pointer as the first member of the struct). So their size is 8 bytes (assuming 8 bit bytes), plus the size of a pointer. On a system with 64 bits of addressable memory, pointers will be 8 bytes, giving a struct size of 16 bytes. As for your other memory related questions, I think you have to dive into `malloc.c` in the Perl source to get your answer. I looked in the file, and it looks like it's written by someone not familiar with the rest of the perl source code: it actually has large comment sections.	[reply] [d/l]
Re^4: Circular reference testing. by BrowserUk (Patriarch) on Mar 09, 2005 at 11:04 UTC
Re^5: Circular reference testing. by Anonymous Monk on Mar 09, 2005 at 11:38 UTC
Some notes below your chosen depth have not been shown here
Re: Circular reference testing. by DrHyde (Prior) on Mar 09, 2005 at 10:37 UTC
Interesting technique. In Data::Compare I use something very different - I just keep a hash whose keys are stringified representations of everywhere that I have been. Your method wins if you have huge amounts of data, and I might adopt it myself, but I'd be worried that it would break: if the code was moved from a 32-bit to a 64-bit machine on different versions of perl, especially 5.0 vs 5.8 using the system malloc() vs perl's own malloc() It looks awfully fragile.	[reply]
Re^2: Circular reference testing. by BrowserUk (Patriarch) on Mar 09, 2005 at 11:02 UTC
I'm specifically trying to address the memory consumption of existing solutions when applied to huge data structures. Data::Dumper also uses a hash, and regularly uses upwards of 50% of the memory consumed by the datastructure being traversed, for a combination of reference checking and output accumulation. That's okay for small and even medium-sized structures, but it sucks for big stuff. This technique reduces the memory required for the checking (without all the extra information for all the other things that Data::Dumper does) to 1% (1-byte represents 96 bytes). if the code was moved from a 32-bit to a 64-bit machine on different versions of perl, especially 5.0 vs 5.8 using the system malloc() vs perl's own malloc() It looks awfully fragile. These are the very things that I am attempting to find out through this thread. My current belief is that it should be solid across all those variations if I stick to the current formulation of 1-bit : 12-bytes and make no attempt to further trim by applying the bounds. My (current) understanding is that no two references visible to Perl will ever coexist within a given 12-byte space regardless of the build/system/platform/malloc() employed. My understanding is obviously open to correction, but that is certainly the way it looks to me at this point. Thorough testing will obviously be necessary, but I think that the speed, simplicity and small footprint are worth pursuing. It has crossed my mind that if it works as I think it will, then it could be used internally by Perl to alleviate the "circular references preventing release of memory" problem. At least where these are self referential rather than co-referential. Examine what is said, not who speaks. Silence betokens consent. Love the truth but pardon error.	[reply]
Re^3: Circular reference testing. by Anonymous Monk on Mar 09, 2005 at 11:45 UTC
You are of course not interesting in all the memory locations used by perl, just the SVs (AVs, HVs, etc), because that's where the reference counting is, and if they get cleaned up, they take away what's behind them as well. Is it possible to write a piece of XS code that gives you the memory pools used for the SVs? You could then use a bitvector per SV memory pool.	[reply]
Re^4: Circular reference testing. by BrowserUk (Patriarch) on Mar 09, 2005 at 17:33 UTC
Re: Circular reference testing. by hv (Prior) on Mar 09, 2005 at 13:23 UTC
Can anyone see a time when this would fail? It would be entirely possible for a future version of perl to numify a reference to something other than an actual memory address, or to have SVs that are less than 12 bytes apart. (I don't know offhand whether Ponie is already planning to break either of those assumptions.) For existing v5.x perl I'm not aware of any situation in which your assumptions would be invalid. Hugo	[reply]
Re^2: Circular reference testing. by BrowserUk (Patriarch) on Mar 09, 2005 at 17:30 UTC
... to numify a reference to something other than an actual memory address ... That would tend to invalidate the mechanisms (mostly using hashes), in use by existing modules also? ... whether Ponie is already planning to break either ... In the non-reference counting, brave new world of mark&sweep GC, it probably will break it, but it ~~begs~~ requires the question of whether there will be some mechanism for asking the VM about circular references. How do other systems that use mark&sweep allow traversals of potntially circular structures to detect that? Examine what is said, not who speaks. Silence betokens consent. Love the truth but pardon error.	[reply]
Re^3: Circular reference testing. by hv (Prior) on Mar 10, 2005 at 09:41 UTC
[numifying a reference to something other than an actual memory address] would tend to invalidate the mechanisms (mostly using hashes), in use by existing modules also? It needn't do so: all that is required is that the value be a) consistent and b) unique. If some person or other tries to cheat by making assumptions about the value, then yes their approach could be invalidated by such a change. it ~~begs~~ requires the question of whether there will be some mechanism for asking the [Ponie] VM about circular references I anticipate that Ponie would throw away the reference count (thus reducing the minimum size of an SV below 12), and maybe introduce another level of indirection (which could reduce the size of all SV headers to 4 or 8), but as long as references continue to numify to a consistent unique value (and probably still the memory address) the existing techniques will continue to work for discovering circular references. Hugo	[reply]
Re^4: Circular reference testing. by BrowserUk (Patriarch) on Mar 10, 2005 at 10:40 UTC