in reply to Re^2: Size of Judy::HS array: where is MemUsed()?
in thread Size of Judy::HS array: where is MemUsed()?

G'day kcott,

During the "Rosetta Code: Long List is Long" experiment, I called the JudySL/HS free functions in C to obtain the amount of memory used. The Judy::HS Perl module also returns bytes. Judy::HS wowed me regarding memory utilization versus native hash. I did not call MemUsed at the time.

my bytes = Free( $Judy );

Replies are listed 'Best First'.
Re^4: Size of Judy::HS array: where is MemUsed()?
by kcott (Archbishop) on Apr 10, 2023 at 23:42 UTC

    G'day Mario,

    Thanks for the feedback. As mentioned earlier, this was put on hold for a family Easter event; I expect to be working on it again this week.

    My main concern with MemUsed() was the bug(s) reported by hv: if I were to present Judy::HS to $work, as a buggy module, which needed patching, and appeared to be abandonware, it probably wouldn't be received too well. Using Memory::Usage instead of MemUsed() would circumvent this problem; other parts of Judy::HS seem solid (from what I've read).

    Early results do show that Judy::HS used a lot less memory than %hash.

    I initially used /usr/share/dict/australian-english to populate the hash keys. I chose this because it was the largest of several files I have in /usr/share/dict/ (the fact that I'm an Aussie was only a secondary consideration); however, I found that this file has entries with characters outside the 7-bit ASCII range (e.g. Ångström). This required some encoding manipulation for Judy::HS; creating this data structure was slower than for a %hash.

    /usr/share/dict/linux.words is the smallest in that directory and, as far as I can tell, only uses 7-bit ASCII. I'll be giving that a try to see how Judy::HS fares against %hash when there's no encoding consideration.

    There's other areas I intend to address, which will likely include: reading the data structures with and without encoding; non-integer values; and, complex structures (e.g. HoH).

    All very interesting; there should be a Meditation somewhere down the track with results of this investigation.

    — Ken

        This raises many good questions; however, at this stage, answering most would require crystal ball gazing.

        In terms of memory vs. speed; the latter is, by far, the more important.

        Choosing the largest file (/usr/share/dict/australian-english) for testing, then finding the encoding requirements (details earlier) was probably fortuitous in that it alerted me to this issue. However, strings containing sequences of A, C, G & T have only 7-bit ASCII characters and would not require encoding. Testing with /usr/share/dict/linux.words may have interesting results.

        Although I did see potential $work applications, this really just started out of interest and was an academic exercise. I'll probably still continue investigating the aspects mentioned earlier, even if unsuitable for $work.

        — Ken