comment on

I've never heard of Judy Arrays before

Same for me. But from a first look at the Wikipedia article, it seems to me that Judy arrays offer a significant gain over hash tables only for some edge cases. Quoting Wikipedia:

Judy arrays are designed to minimize the number of expensive cache-line fills from RAM, and so the algorithm contains much complex logic to avoid cache misses as often as possible. Due to these cache optimizations, Judy arrays are fast, especially for very large datasets. On data sets that are sequential or nearly sequential, Judy arrays can even outperform hash tables, since, unlike hash tables, the internal tree structure of Judy arrays maintains the ordering of the keys.

And further:

Judy arrays are extremely complicated. The smallest implementations are thousands of lines of code. In addition, Judy arrays are optimized for machines with 64 byte cache lines, making them essentially unportable without a significant rewrite. In most applications the possible performance advantage is too small to justify the high complexity of the data structure implementation.

So, for me, it looks like a piece of code optimized for a very specific problem on a very specific CPU. To make things worse, there are very few independent sources. Wikipedia only links to a re-implementation found at code.google.com and one "independent performance comparison of Judy to Hash Tables". The latter does not look that promising. Quoting its summary:

As illustrated in this data, Judy's smaller size does not give it an enormous speed advantage over a traditional "trade size for speed" data structure. Judy has received countless man-hours developing and debugging 20,000 lines of code; I spent an hour or three writing a fairly standard 200-line hash table.

If your data is strictly sequential; you should use a regular array. If your data is often sequential, or approximately sequential (e.g. an arithmetic sequence stepping by 64), Judy might be the best data structure to use. If you need to keep space to a minimum--you have a huge number of associative arrays, or you're only storing very small values, Judy is probably a good idea. If you need an sorted iterator, go with Judy. Otherwise, a hash table may be just as effective, possibly faster, and much simpler.

Furthermore, there aren't many ways to improve Judy, but there's a lot of room to improve hash tables. The one tested here is just my standard "classic" hash table. I don't know what best practice really is these days. For example, Knuth describes numerous strategies for hashing to allow the table to be more full. [...]

I did not look at the code. But I flipped through the "Shop Manual" and a had a laugh at the "10-MINUTE DESCRIPTION OF HOW JUDY ARRAYS WORK AND WHY THEY ARE SO FAST" that ends with "Well I cannot describe Judy in 10 minutes". One thing that I've learned very early in my job is that you should be able to explain your product to anyone you meet in an elevator, within the time you spend in the elevator. That's basically within a minute or less. The shop manual also fails to sell Judy arrays to me.

Don't get me wrong, it's good to have a perl interface for Judy arrays, but I currently doubt that they are ready for production.

Code optimized for a very specific CPU, and further more, seemingly written so it can't be easily ported. An algorithm offering only slight advantages, if any, over existing solutions. Small performance and memory gains. It may be worth going that way if you have maxed out your multi-million dollar super computer. In all other cases, switching to a faster CPU, adding more memory, and use conventional algorithms looks easier and more maintainable to me.

Of course, I may be totally wrong and everyone should store everything in Judy arrays.

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

In reply to Re^2: need help with Tie::Judy by afoken
in thread need help with Tie::Judy by expo1967

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.