in reply to Ordering objects using external index

You should look at maintaining the %uid2msg mapping as you go along. This shouldn't be too hard since $self appears to be an object. It just means you'll have to add some code to your insert and delete methods to keep $self->{uid2msg} up to date. This is exactly what a database does when you mark a column as indexed.

In the benchmark below, this resulted in a speedup of about 9x. Of course you do have to pay a little for maintaining the %uid2msg index but I'm assuming in your case you do a lot more reading than inserting and deleting.

Once you've done that, you can further speed things up with hash slice in the return, changing

return [ map { $uid2msg{$_} } @$uids ];
to
return [ @uid2msg{ @$uids } ];
This doesn't make much of difference in the original version but it more than doubles performance when %uid2msg uses a precomputed index.

Here's the results of a benchmark for 1000 msg objects

Benchmark: running hashslice, hashslice_pre, original, original_pre fo +r at least 5 CPU seconds... hashslice: 5 wallclock secs ( 5.29 usr + 0.00 sys = 5.29 CPU) @ 97 +.35/s (n=515) hashslice_pre: 5 wallclock secs ( 5.30 usr + 0.01 sys = 5.31 CPU) @ + 1757.63/s (n=9333) original: 5 wallclock secs ( 5.30 usr + 0.00 sys = 5.30 CPU) @ 91 +.51/s (n=485) original_pre: 5 wallclock secs ( 5.33 usr + 0.00 sys = 5.33 CPU) @ +817.07/s (n=4355)
and 10000 msg objects
Benchmark: running hashslice, hashslice_pre, original, original_pre fo +r at least 5 CPU seconds... hashslice: 5 wallclock secs ( 5.04 usr + 0.04 sys = 5.08 CPU) @ 7 +.28/s (n=37) hashslice_pre: 5 wallclock secs ( 5.27 usr + 0.01 sys = 5.28 CPU) @ + 93.56/s (n=494) original: 5 wallclock secs ( 5.08 usr + 0.01 sys = 5.09 CPU) @ 6 +.68/s (n=34) original_pre: 6 wallclock secs ( 5.37 usr + 0.00 sys = 5.37 CPU) @ +46.93/s (n=252)
the _pre versions are hugely faster. Code below
use Benchmark; my $UID = 0; my $uids = []; my $msgs = []; for (1..10000) { UO->new; } my %pre_uid2msg = map { $_->uid => $_ } @$msgs; timethese(-5, { original => sub { my %uid2msg = map { $_->uid => $_ } @$msgs; return [ map { $uid2msg{$_} } @$uids ]; }, hashslice => sub { my %uid2msg = map { $_->uid => $_ } @$msgs; return [ @uid2msg{ @$uids }]; }, original_pre => sub { return [ map { $pre_uid2msg{$_} } @$uids ]; }, hashslice_pre => sub { return [ @pre_uid2msg{ @$uids }]; } } ); package UO; sub new { $UID += rand(1000); my $self = bless {uid => $UID}, shift(); push(@$uids, $UID); push(@$msgs, $self); return $self; } sub uid { my $self = shift; return $self->{uid}; }

edit (broquaint): changed <pre> tags to <code> tags

Replies are listed 'Best First'.
Re^2: Ordering objects using external index
by kappa (Chaplain) on Sep 06, 2004 at 21:41 UTC

    Thanks for a comprehensive reply! I'll certainly incorporate some ideas as soon as I'm at work!

    Your main suggestion is to keep the hash always up-to-date as I do something on the messages array. That is actually my next big problem :)) You see, the messages can be sorted by different criteria. Currently, there're only eight. So, on each write operation on the messages array I will need to update eight indices. That looks weird.

    The main reason to separate sorting order into another array was to be able to save lots of presorted indices (currently they are in memcached) for a big message list and then quickly retrieve messages in the order I need. So the actual events that take place in the script are these: load big array, load indices, try to sort the array in less than n*log(n) ops using the indices. Hope this will clarify my intentions. I can probably try to save both $uids and %uid2msg for each criterium.

    Are there any other way to presort array on different criteria and save the order for future reference? Seems like this is my real question :)

      I replied to this already but something seems to have gone wrong and the reply didn't make it. Basically if you have 8 columns that you need to index then need 8 indexes. No way around it. If you are only retrieving the sorted list once and then forgetting about it forever, then maintaining the indices only slows you down and it's not worth it. However if you are going to retrieve it even just a few times, then it's probably a win.

      You could also try DB_File with it's DB_BTREE functionality to handle the sorting and storing of the arrays. This effectively gives you a sorted hash that persists on disk between calls to your program. You would maintain 8 of these and whenever you add a message, you would do

      tie %index1, "DB_File", "index1", O_RDWR&#9474;O_CREAT, 0666, $DB_BTRE +E tie %index2, "DB_File", "index2", O_RDWR&#9474;O_CREAT, 0666, $DB_BTRE +E ... sub insert { my $msg = shift; $index1{$msg->key1} = $msg->uid; $index2{$msg->key2} = $msg->uid; ... } my @sorted_by_index1 = @uid2msg{values %index1};
      unlike a normal hash, when you use a DB_BTREE values will give you the values back in the correct order (sort by their keys)

      If you go down this route you are basically implementing your own database and you may want to look at just using DBD::SQLite which gives you a fast, direct to disk database.