in reply to Re^2: Netflix (or on handling large amounts of data efficiently in perl)
in thread Netflix (or on handling large amounts of data efficiently in perl)

Let's see if I'm following your reasoning correctly.

I'm essentially interested in three variables:
$movieid
$userid
$rating

Are you suggesting that I make a multi-dimensional Judy array of arrays? So for each movie create a Judy array using $userid as the index and $rating as the value, then put that into a Judy array as the value with $movieid as the index?

Apologies if I'm stating the obvious, I wouldn't classify myself as a programmer.

From a very, very rough test (not even gone back to confirm availability of data) this is looking very good indeed for memory consumption. Will do some further testing tomorrow

  • Comment on Re^3: Netflix (or on handling large amounts of data efficiently in perl)

Replies are listed 'Best First'.
Re^4: Netflix (or on handling large amounts of data efficiently in perl)
by diotalevi (Canon) on Dec 29, 2008 at 22:58 UTC

    Sure, why not. tilly originally mentioned a bitmap so I mentioned something cheaper in memory. You can build multi dimensional Judy arrays. In particular, JudyHS is implemented as a nested set of JudyL arrays. I've posted a snippet at Dump JudyHS which demos dumping a JudyHS structure.

    It's explicitly required for this to work that Judy is nestable.

    ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊