in reply to Memory overhead of blessed hashes

Perhaps Devel::Size isn't telling the whole story, but...

#!/usr/bin/env perl use strict; use warnings; use Devel::Size qw(total_size); my(@o, @h); print "Total starting size for \@o: ", total_size(\@o), "\n"; print "Total starting size for \@h: ", total_size(\@h), "\n"; for (1..100000) { push @o, bless {}, 'Foo'; } for (1..100000) { push @h, {}; } print "Total ending size for \@o: ", total_size(\@o), "\n"; print "Total ending size for \@h: ", total_size(\@h), "\n";

This produces:

Total starting size for @o: 64 Total starting size for @h: 64 Total ending size for @o: 15246272 Total ending size for @h: 15246272

The fact that a hashref is blessed doesn't seem to have any bearing on the total memory consumption. If I use Devel::Peek to see inside a blessed hashref versus a hash, I see one field in the SV that changes: STASH = 0x25fc9d8 "Foo". (Of course, the address will be different on every run). The FLAGS change too, but I don't think that changes memory consumption:

#!/usr/bin/env perl use strict; use warnings; use Devel::Peek; my $o = bless {}, 'Foo'; my $o2 = bless {}, 'Foo'; my $h = {}; Dump($o); Dump($o2); Dump($h);

This produces:

SV = IV(0x1e03ee8) at 0x1e03ef8 REFCNT = 1 FLAGS = (ROK) RV = 0x1de0358 SV = PVHV(0x1de5b70) at 0x1de0358 REFCNT = 1 FLAGS = (OBJECT,SHAREKEYS) STASH = 0x1dfa948 "Foo" ARRAY = 0x0 KEYS = 0 FILL = 0 MAX = 7 SV = IV(0x1e03e40) at 0x1e03e50 REFCNT = 1 FLAGS = (ROK) RV = 0x1de0508 SV = PVHV(0x1de60d0) at 0x1de0508 REFCNT = 1 FLAGS = (OBJECT,SHAREKEYS) STASH = 0x1dfa948 "Foo" ARRAY = 0x0 KEYS = 0 FILL = 0 MAX = 7 SV = IV(0x1e03e58) at 0x1e03e68 REFCNT = 1 FLAGS = (ROK) RV = 0x1dfa990 SV = PVHV(0x1de6130) at 0x1dfa990 REFCNT = 1 FLAGS = (SHAREKEYS) ARRAY = 0x0 KEYS = 0 FILL = 0 MAX = 7

Possibly worth noting, the STASH value is the same for $o and $o2, so both instances of the object point to the same "Foo" stash, which is encouraging.

I know it's not your code, but if the code is getting memory pinched, and it's not related to a memory leak, another strategy may be to allow the objects to be serialized / deserialized quickly, and maintain an index for where to find their serializations. Is it really necessary to hold 100k+ in memory at once? If so, if the primary attribute of each object were just a path where the serialization of the remainder of the object's guts can be found, you might save space. Anyway, that's just a thought. There could be constraints preventing that approach.


Dave

Replies are listed 'Best First'.
Re^2: Memory overhead of blessed hashes
by LanX (Saint) on Feb 10, 2021 at 17:58 UTC
    > I know it's not your code,

    I don't know what they did, and I want to avoid another "told you so" situation.²

    Just fighting off FUD theories that bless had a memory impact and trying to educate myself.

    > Is it really necessary to hold 100k+ in memory at once?

    From my understanding: they are building complicated trees (well multi-trees°) within a short time window.

    > If so, if the primary attribute of each object were just a path where the serialization of the remainder of the object's guts can be found, you might save space.

    That's a good idea.

    Tho in my experience are Perl&OS pretty efficient in swapping unused hashes as long as they are small enough.

    Of course the performance depends on the frequency you need to access those, but the same applies to your serialization idea.

    Hmm ...

    Actually this is a good counter argument to insight-out-objects, because class-variables holding data for all objects can't be swapped.

    So it's sometimes better to keep rarely used "guts" data inside small hashes at lower nested levels.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

    °) elements can have multiple parents (aggregation semantic)

    ²) see also "Chuck Norris"-ing code

      Just fighting off FUD theories that bless had a memory impact and trying to educate myself.

      To be fair, bless does have a small memory impact: packages used for object classes have slightly greater overhead (per-package) and blessed scalars must be upgraded to carry magic (which also adds the STASH pointer), but the per-object overhead for blessed aggregates is zero — AV and HV structures are large enough that they always have the slot for the STASH pointer.

      Actually this is a good counter argument to insight-out-objects, because class-variables holding data for all objects can't be swapped.

      Virtual memory does not know about that — swapping occurs at page granularity regardless of larger structures. If the hash table is large enough, and accesses do not result in scanning the entire table, portions of the hash table can be swapped out by the OS, even if other parts of the table are held in memory due to frequent access. If one SV on a page is frequently accessed, everything else on that page is also kept in memory.

      So it's sometimes better to keep rarely used "guts" data inside small hashes at lower nested levels.

      Your problem here seems to be the fixed per-hash HV overhead, which is a consequence of the existence of many small hashes in your program, whether blessed or plain.

      If you have a relatively small tree node and search/index keys object with a relatively large and generally opaque "data payload" segment, you could use inside-out-objects to reduce the hash overhead for the search/index keys and DBI/SQLite to store the payloads, possibly in an in-memory database, but once you have eliminated the per-object HV overhead, simply serializing the payloads and storing them in one more hash will probably be comparable to using an in-memory SQLite database for much lower overhead. Unless, of course, you can actually move your entire data tree into SQLite and use SQL to access it, or the payloads really are a large part of the problem and SQLite allows you to move them out to disk while keeping the tree structure in the inside-out objects.

        > If the hash table is large enough, and accesses do not result in scanning the entire table, portions of the hash table can be swapped out by the OS, even if other parts of the table are held in memory due to frequent access.

        Well in theory, but if it comes to hashes that's pretty unlikely.

        If one part of a hash is much more frequently accessed than another one, then the hashing function can't be very good.

        Or it's always only the same key. :)

        Anyway I once had stunning results after transforming a giant hash into a two tier HoH and letting the algorithm concentrating on always a very small group of second tier hashes.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery