Re: Circular reference testing.
by Anonymous Monk on Mar 08, 2005 at 16:54 UTC
|
Most SV structures are greater than 12 bytes, and I could reduce the overhead commensurately if (for example) there was only ever one SvNULL allocated in the system and all other SV's were multiples of 16 or 20 bytes?
But more than one SvNULL will be created, as the following shows:
$ perl -MDevel::Peek -we '$a = undef; $b = undef; Dump($a); Dump($b)'
SV = NULL(0x0) at 0x8191dc4
REFCNT = 1
FLAGS = ()
SV = NULL(0x0) at 0x8191ddc
REFCNT = 1
FLAGS = ()
And note that their addresses are 24 bytes apart, which doesn't suggest any alignment on 16 or 20 byte intervals. This:
$ perl -MDevel::Peek -we '$a = 1; $b = 2; Dump ($a); Dump ($b)'
SV = IV(0x8192edc) at 0x8191dbc
REFCNT = 1
FLAGS = (IOK,pIOK)
IV = 1
SV = IV(0x8192ee4) at 0x8191de0
REFCNT = 1
FLAGS = (IOK,pIOK)
IV = 2
creates two SVs 36 bytes apart, which suggest you can't use anything larger than 12. | [reply] [d/l] [select] |
|
|
I think you may have missed the "...if (for example)..." part of my post?
But more than one SvNULL will be created
That's a shame. Think of the space that would be saved if all SV's that were undef, were pointers to a single, shared copy of undef with a COW flag set?
creates two SVs 36 bytes apart, which suggest you can't use anything larger than 12.
Yeah! Since posting, I looked at the addresses of consecutive elements of an array and 12 does appear to be the minimum I can rely upon.
Any thoughts on discovering the lowest and highest addresses?
Examine what is said, not who speaks.
Silence betokens consent.
Love the truth but pardon error.
| [reply] |
Re: Circular reference testing.
by dave_the_m (Monsignor) on Mar 09, 2005 at 00:21 UTC
|
| [reply] [d/l] |
|
|
| [reply] |
|
|
struct STRUCT_SV { /* struct sv { */
void* sv_any; /* pointer to something */
U32 sv_refcnt; /* how many references to us */
U32 sv_flags; /* what we are */
};
(AVs, HVs, etc, look the same, except they have a differently typed pointer as the first member of the struct). So their size is 8 bytes (assuming 8 bit bytes), plus the size of a pointer. On a system with 64 bits of addressable memory, pointers will be 8 bytes, giving a struct size of 16 bytes.
As for your other memory related questions, I think you have to dive into malloc.c in the Perl source to get your answer. I looked in the file, and it looks like it's written by someone not familiar with the rest of the perl source code: it actually has large comment sections. | [reply] [d/l] |
|
|
|
|
|
Re: Circular reference testing.
by DrHyde (Prior) on Mar 09, 2005 at 10:37 UTC
|
Interesting technique. In Data::Compare I use something very different - I just keep a hash whose keys are stringified representations of everywhere that I have been. Your method wins if you have huge amounts of data, and I might adopt it myself, but I'd be worried that it would break:
- if the code was moved from a 32-bit to a 64-bit machine
- on different versions of perl, especially 5.0 vs 5.8
- using the system malloc() vs perl's own malloc()
It looks awfully fragile. | [reply] |
|
|
I'm specifically trying to address the memory consumption of existing solutions when applied to huge data structures. Data::Dumper also uses a hash, and regularly uses upwards of 50% of the memory consumed by the datastructure being traversed, for a combination of reference checking and output accumulation. That's okay for small and even medium-sized structures, but it sucks for big stuff.
This technique reduces the memory required for the checking (without all the extra information for all the other things that Data::Dumper does) to 1% (1-byte represents 96 bytes).
- if the code was moved from a 32-bit to a 64-bit machine
- on different versions of perl, especially 5.0 vs 5.8
- using the system malloc() vs perl's own malloc()
- It looks awfully fragile.
These are the very things that I am attempting to find out through this thread. My current belief is that it should be solid across all those variations if I stick to the current formulation of 1-bit : 12-bytes and make no attempt to further trim by applying the bounds.
My (current) understanding is that no two references visible to Perl will ever coexist within a given 12-byte space regardless of the build/system/platform/malloc() employed. My understanding is obviously open to correction, but that is certainly the way it looks to me at this point.
Thorough testing will obviously be necessary, but I think that the speed, simplicity and small footprint are worth pursuing.
It has crossed my mind that if it works as I think it will, then it could be used internally by Perl to alleviate the "circular references preventing release of memory" problem. At least where these are self referential rather than co-referential.
Examine what is said, not who speaks.
Silence betokens consent.
Love the truth but pardon error.
| [reply] |
|
|
| [reply] |
|
|
Re: Circular reference testing.
by hv (Prior) on Mar 09, 2005 at 13:23 UTC
|
Can anyone see a time when this would fail?
It would be entirely possible for a future version of perl to numify a reference to something other than an actual memory address, or to have SVs that are less than 12 bytes apart. (I don't know offhand whether Ponie is already planning to break either of those assumptions.)
For existing v5.x perl I'm not aware of any situation in which your assumptions would be invalid.
Hugo
| [reply] |
|
|
... to numify a reference to something other than an actual memory address ...
That would tend to invalidate the mechanisms (mostly using hashes), in use by existing modules also?
... whether Ponie is already planning to break either ...
In the non-reference counting, brave new world of mark&sweep GC, it probably will break it, but it begs requires the question of whether there will be some mechanism for asking the VM about circular references.
How do other systems that use mark&sweep allow traversals of potntially circular structures to detect that?
Examine what is said, not who speaks.
Silence betokens consent.
Love the truth but pardon error.
| [reply] |
|
|
[numifying a reference to something other than an actual memory address] would tend to invalidate the mechanisms (mostly using hashes), in use by existing modules also?
It needn't do so: all that is required is that the value be a) consistent and b) unique. If some person or other tries to cheat by making assumptions about the value, then yes their approach could be invalidated by such a change.
it begs requires the question of whether there will be some mechanism for asking the [Ponie] VM about circular references
I anticipate that Ponie would throw away the reference count (thus reducing the minimum size of an SV below 12), and maybe introduce another level of indirection (which could reduce the size of all SV headers to 4 or 8), but as long as references continue to numify to a consistent unique value (and probably still the memory address) the existing techniques will continue to work for discovering circular references.
Hugo
| [reply] |
|
|