Ordering objects using external index

kappa has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Ordering objects using external index by fergal (Chaplain) on Sep 06, 2004 at 19:04 UTC
You should look at maintaining the %uid2msg mapping as you go along. This shouldn't be too hard since $self appears to be an object. It just means you'll have to add some code to your insert and delete methods to keep `$self->{uid2msg}` up to date. This is exactly what a database does when you mark a column as indexed. In the benchmark below, this resulted in a speedup of about 9x. Of course you do have to pay a little for maintaining the %uid2msg index but I'm assuming in your case you do a lot more reading than inserting and deleting. Once you've done that, you can further speed things up with hash slice in the return, changing `return [ map { $uid2msg{$_} } @$uids ];` [download] to `return [ @uid2msg{ @$uids } ];` [download] This doesn't make much of difference in the original version but it more than doubles performance when %uid2msg uses a precomputed index. Here's the results of a benchmark for 1000 msg objects `Benchmark: running hashslice, hashslice_pre, original, original_pre fo +r at least 5 CPU seconds... hashslice: 5 wallclock secs ( 5.29 usr + 0.00 sys = 5.29 CPU) @ 97 +.35/s (n=515) hashslice_pre: 5 wallclock secs ( 5.30 usr + 0.01 sys = 5.31 CPU) @ + 1757.63/s (n=9333) original: 5 wallclock secs ( 5.30 usr + 0.00 sys = 5.30 CPU) @ 91 +.51/s (n=485) original_pre: 5 wallclock secs ( 5.33 usr + 0.00 sys = 5.33 CPU) @ +817.07/s (n=4355)` [download] and 10000 msg objects `Benchmark: running hashslice, hashslice_pre, original, original_pre fo +r at least 5 CPU seconds... hashslice: 5 wallclock secs ( 5.04 usr + 0.04 sys = 5.08 CPU) @ 7 +.28/s (n=37) hashslice_pre: 5 wallclock secs ( 5.27 usr + 0.01 sys = 5.28 CPU) @ + 93.56/s (n=494) original: 5 wallclock secs ( 5.08 usr + 0.01 sys = 5.09 CPU) @ 6 +.68/s (n=34) original_pre: 6 wallclock secs ( 5.37 usr + 0.00 sys = 5.37 CPU) @ +46.93/s (n=252)` [download] the _pre versions are hugely faster. Code below use Benchmark; my $UID = 0; my $uids = []; my $msgs = []; for (1..10000) { UO->new; } my %pre_uid2msg = map { $_->uid => $_ } @$msgs; timethese(-5, { original => sub { my %uid2msg = map { $_->uid => $_ } @$msgs; return [ map { $uid2msg{$_} } @$uids ]; }, hashslice => sub { my %uid2msg = map { $_->uid => $_ } @$msgs; return [ @uid2msg{ @$uids }]; }, original_pre => sub { return [ map { $pre_uid2msg{$_} } @$uids ]; }, hashslice_pre => sub { return [ @pre_uid2msg{ @$uids }]; } } ); package UO; sub new { $UID += rand(1000); my $self = bless {uid => $UID}, shift(); push(@$uids, $UID); push(@$msgs, $self); return $self; } sub uid { my $self = shift; return $self->{uid}; } [download] edit (broquaint): changed `<pre>` tags to `<code>` tags	[reply] [d/l] [select]
Re^2: Ordering objects using external index by kappa (Chaplain) on Sep 06, 2004 at 21:41 UTC
Thanks for a comprehensive reply! I'll certainly incorporate some ideas as soon as I'm at work! Your main suggestion is to keep the hash always up-to-date as I do something on the messages array. That is actually my next big problem :)) You see, the messages can be sorted by different criteria. Currently, there're only eight. So, on each write operation on the messages array I will need to update eight indices. That looks weird. The main reason to separate sorting order into another array was to be able to save lots of presorted indices (currently they are in `memcached`) for a big message list and then quickly retrieve messages in the order I need. So the actual events that take place in the script are these: load big array, load indices, try to sort the array in less than `nlog(n)` ops using the indices. Hope this will clarify my intentions. I can probably try to save both `$uids` and `%uid2msg` for each criterium. Are there any other way to presort array on different criteria and save the order for future reference? Seems like this is my real* question :)	[reply] [d/l] [select]
Re^3: Ordering objects using external index by fergal (Chaplain) on Sep 07, 2004 at 15:30 UTC
I replied to this already but something seems to have gone wrong and the reply didn't make it. Basically if you have 8 columns that you need to index then need 8 indexes. No way around it. If you are only retrieving the sorted list once and then forgetting about it forever, then maintaining the indices only slows you down and it's not worth it. However if you are going to retrieve it even just a few times, then it's probably a win. You could also try DB_File with it's DB_BTREE functionality to handle the sorting and storing of the arrays. This effectively gives you a sorted hash that persists on disk between calls to your program. You would maintain 8 of these and whenever you add a message, you would do `tie %index1, "DB_File", "index1", O_RDWR│O_CREAT, 0666, $DB_BTRE +E tie %index2, "DB_File", "index2", O_RDWR│O_CREAT, 0666, $DB_BTRE +E ... sub insert { my $msg = shift; $index1{$msg->key1} = $msg->uid; $index2{$msg->key2} = $msg->uid; ... } my @sorted_by_index1 = @uid2msg{values %index1};` [download] unlike a normal hash, when you use a DB_BTREE values will give you the values back in the correct order (sort by their keys) If you go down this route you are basically implementing your own database and you may want to look at just using DBD::SQLite which gives you a fast, direct to disk database.	[reply] [d/l]
Re: Ordering objects using external index by saintmike (Vicar) on Sep 06, 2004 at 17:45 UTC
Check out this thread, it seems like your requirements are very similar.	[reply]
Re^2: Ordering objects using external index by kappa (Chaplain) on Sep 07, 2004 at 11:34 UTC
Exactly. Thanks! Sadly enough, that discussion didn't end up in anything efficient either :((	[reply]
Re: Ordering objects using external index by Anonymous Monk on Sep 06, 2004 at 18:29 UTC
`return [ map { $uid2msg{$_} } @$uids ];` Can be written as: `return @uid2msg{@$uids};` But I don't know if that's any faster...	[reply] [d/l] [select]
Re: Ordering objects using external index by BrowserUk (Patriarch) on Sep 06, 2004 at 19:23 UTC
A few examples of the data might have clarified your question. Do you mean something like this? `#! perl -slw use strict; my $uids = [ 9, 1, 4, 7, 2, 0, 3, 6, 8, 5 ]; my $msgs = [ qw[ zero one two three four five six seven eight nine ] ] +; my @msgsByUid = @$msgs[ @$uids ]; print for @msgsByUid; __END__ P:\test>junk nine one four seven two zero three six eight five` [download] Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon	[reply] [d/l]
Re^2: Ordering objects using external index by kappa (Chaplain) on Sep 06, 2004 at 21:12 UTC
Aa, kind of. Look: `my $uids = [ 5674, 1, 4 ]; my $msgs = [ $msg1, $msg4, $msg5674 ];` [download] Provided `$msgNN->uid == NN` we'd like to have `[ $msg5674, $msg1, $msg4 ]`.	[reply] [d/l] [select]
Re^3: Ordering objects using external index by BrowserUk (Patriarch) on Sep 06, 2004 at 21:36 UTC
That makes it look like your using symbolic references? Ie. Variable names that are (partially) made up from other variable names. eg. `$uid = 5674; ${'msg' . $uid } = ...;` [download] In which case, you should be making that a hash directly: `push @uid, 5674; $msgs{ $uid[ -1 ] } = ...;` [download] then you wouldn't be having the mapping problem later on. Producing your ordered array would then become a simple hash slice: `@ordered = @msgs{ @uid };` [download] It's difficult to know without seeing how the variables and data in your snippets are beiing created. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon	[reply] [d/l] [select]
Re^4: Ordering objects using external index by kappa (Chaplain) on Sep 07, 2004 at 11:14 UTC