Re^5: Size of Judy::HS array: where is MemUsed()? - perldelta, Perl Releases and Building Perl

if I were to present Judy::HS to $work, as a buggy module, which needed patching, and appeared to be abandonware, it probably wouldn't be received too well

Agreed. I find the topic of third party dependencies a difficult and fascinating one -- see Re: Criteria for when to use a cpan module (Buy vs Build).

Early results do show that Judy::HS used a lot less memory than %hash

That agrees with marioroy's experiences and tye's experiences and BrowserUk's experiences:

A supersearch for my name and judy arrays will turn up a compact, single file version of the Judy array code that compiled clean and worked very well for that application. Still slower than hashes, but far more compact. Just hope you don't find any bugs, because the Judy code is horribly complex.

-- BrowserUk in Re: Fastest way to lookup a point in a set

Assuming you see big savings in memory (but not speed), would you recommend Judy at $work? That would seem to depend on your company's business model.

If your software must run on thousands of (many different) customer machines, then buying more memory is not an option.

OTOH, if you're running high performance code for clients on machines that you own and control, it may be better to stick with Perl hashes and just buy more memory. After all, a DDR4 DIMM can hold up to 64 GB while DDR5 octuples that to 512 GB, so I expect the cost of buying more memory for multiple in-house machines would be dwarfed by your monthly salary bill.

It's also a low risk solution because no code changes are required and you don't need to worry about Judy being abandoned or hitting a nasty bug in the notoriously complex Judy C code that you need to urgently fix. There's also the opportunity cost of your people working on Judy code instead of other projects.

All very interesting; there should be a Meditation somewhere down the track with results of this investigation.

Looking forward to it. :)

Update: In Re^3: 32bit/64bit hash function: Use perls internal hash function? (Apr 2022) hv reports that a proposed Nicholas Clark change to improve Perl hash performance in perl 5.38 reduced memory usage by 20% and gave substantial speed improvements in one of his test cases (perl 5.38 release date prediction market).

See also:

dev.perl.org - shows perl releases and current stable version
perldelta - what is new for perl v5.36.1
perldelta - for development perl v5.37.11
perldelta - for perl v5.38.0-RC2
Perl Releases (dev.perl.org)
Official Perl Download page - www.perl.org

perl v5.38 (Fedora) - might be released on June 10th 2023 (update: now July 2nd 2023)
perl v5.38.0 is now available - perl 5 porters email announcement
perl-5.38.0 changes

Update: Building Perl from Source References

Re^7: Meaning of XS object version (Package Manager Security References - example building Perl securely from source) - example (secure) build of perl v5.38.0 from source on my Ubuntu Linux VM
Re^7: Rosetta Code: Long List is Long (Updated Solutions - short Perl GRT and for_list) - older version building v5.36.0 from source

Comment on Re^5: Size of Judy::HS array: where is MemUsed()? - perldelta, Perl Releases and Building Perl Download Code

Replies are listed 'Best First'.
Re^6: Size of Judy::HS array: where is MemUsed()? by kcott (Archbishop) on Apr 11, 2023 at 19:34 UTC
This raises many good questions; however, at this stage, answering most would require crystal ball gazing. In terms of memory vs. speed; the latter is, by far, the more important. Choosing the largest file (`/usr/share/dict/australian-english`) for testing, then finding the encoding requirements (details earlier) was probably fortuitous in that it alerted me to this issue. However, strings containing sequences of A, C, G & T have only 7-bit ASCII characters and would not require encoding. Testing with `/usr/share/dict/linux.words` may have interesting results. Although I did see potential $work applications, this really just started out of interest and was an academic exercise. I'll probably still continue investigating the aspects mentioned earlier, even if unsuitable for $work. — Ken	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^6: Size of Judy::HS array: where is MemUsed()?
by kcott (Archbishop) on Apr 11, 2023 at 19:34 UTC

This raises many good questions; however, at this stage, answering most would require crystal ball gazing.

In terms of memory vs. speed; the latter is, by far, the more important.

Choosing the largest file (/usr/share/dict/australian-english) for testing, then finding the encoding requirements (details earlier) was probably fortuitous in that it alerted me to this issue. However, strings containing sequences of A, C, G & T have only 7-bit ASCII characters and would not require encoding. Testing with /usr/share/dict/linux.words may have interesting results.

Although I did see potential $work applications, this really just started out of interest and was an academic exercise. I'll probably still continue investigating the aspects mentioned earlier, even if unsuitable for $work.

— Ken

[reply]
[d/l]
[select]