[OT] Persistent Object IDs

TedYoung has asked for the wisdom of the Perl Monks concerning the following question:

I have a bunch of Perl classes (packages) that extend a base class that knows how to load itself from and save itself to a database (general idea found in Class::DBI).

Each object is automatically given an ID during the save process. This ID is unique only to the class (each class has its own table). I have a separate table mapping classes to next IDs. When I need a new ID, I grab the current value for the given class, and then increment it. This is done in a single transaction for atomicity.⁽¹⁾

I want to support polymorphic storage (i.e. you can store subclasses of class A, in references/collections containing A). To support this, I would like to have ids that are unique across the database (not just tables, as I have now).

There are two ways, I can think of, for generating unique ids across the db. The most obvious would be to have the id table keep track of only one db level next-id, instead of next-ids for each table. The problem with this is all inserts (across the db) have to be handled serially by the db engine, because each insert requires a lock on the single row in the id table. Looking back on it, right now, inserts on a single table have to be handled serially because there is only one "record lock" per table.

A more ideal way would be to use an external generator, like a UUID generator. Now, no id table locking has to go on at all. Inserts should be a lot faster, and very concurrent. The downside is now these id columns go from being a 64 bit integer, to a 128 bit string. Since these id columns are all indexed and used for all table relationships (joins), I am concerned about performance on this end.

I have looked to see what JPOX (a reference implementation of JDO) does. Unfortunately, since JPOX is run as a persistent service (and not a library like me) they can get away with caching options that I don't have. So, they allow you to choose which strategy you want. They do mention that the idea of using a single table with one id row can cause scalability issues.

You might ask, how much performance do you need? Not a whole lot, but I want to do this correctly now and avoid having to change it later.

Has anyone had any experience along these lines, which they could share? Off the top of my head, I would guess that 50% of my queries are selects (with joins), and only 25% are inserts (25% would be updates and deletes). Is it worth using 128 bit strings in my pk indexes and through all joins/relationships just to save on the contention of having a single row (db level) id table?

⁽¹⁾ I use this strategy over autoincrement, identity, sequence rows to maintain db independence, and so I can easily fetch the ID of the record prior to insertion.

Thanks,

Ted Young

($$<<$$=>$$<=>$$<=$$>>$$) always returns 1. :-)

Comment on [OT] Persistent Object IDs Download Code

Replies are listed 'Best First'.
Re: [OT] Persistent Object IDs by kyle (Abbot) on Apr 23, 2007 at 18:45 UTC
One strategy I've seen is to allocate blocks of IDs at a time. When you connect, grab the "next ID", increment by ten, and then those ten IDs are yours to use in that connection. Go back for more when they're exhausted. Update: Since you already have a table of next IDs (each row is an ID, independent of the other rows), you could have multiple rows that you go to. Say you make 20 such rows, each with a different ID in it to start. Each process can go to one of those rows, get the ID, and then increment by 20. You can choose a row at random or according to your process ID or something. Both of these strategies winds up allocating IDs out of order. Also, they basically "put off" the scaling problem by a fixed factor. If you have more than 20 clients, they'll start waiting on each other. If you want to be able to scale indefinitely, you have to generate your ID some way. Since you have 64 bit integers, maybe you could generate them the old fashioned way. Start with time and `$$` (see perlvar) and mix in a per-process serial number. `use English; # Assumes 16-bit PID and 32-bit time my $unique_number = ($serial_number++ << (16+32)) + (time << 16) + $PID;` [download]	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re: [OT] Persistent Object IDs
by kyle (Abbot) on Apr 23, 2007 at 18:45 UTC

One strategy I've seen is to allocate blocks of IDs at a time. When you connect, grab the "next ID", increment by ten, and then those ten IDs are yours to use in that connection. Go back for more when they're exhausted.

Update: Since you already have a table of next IDs (each row is an ID, independent of the other rows), you could have multiple rows that you go to. Say you make 20 such rows, each with a different ID in it to start. Each process can go to one of those rows, get the ID, and then increment by 20. You can choose a row at random or according to your process ID or something.

Both of these strategies winds up allocating IDs out of order. Also, they basically "put off" the scaling problem by a fixed factor. If you have more than 20 clients, they'll start waiting on each other.

If you want to be able to scale indefinitely, you have to generate your ID some way. Since you have 64 bit integers, maybe you could generate them the old fashioned way. Start with time and $$ (see perlvar) and mix in a per-process serial number.

use English;
# Assumes 16-bit PID and 32-bit time
my $unique_number
    = ($serial_number++ << (16+32))
    + (time << 16)
    + $PID;
[download]

[reply]
[d/l]
[select]