Re: An improved technique for database primary keys

My application has a subroutine which will generate an 8-character random string of letters and numbers, prefixed by an optional string provided by the caller.

That can be quite a performance killer if your database stores rows in primary key order. Storing rows in primary key order has some advantages: it saves on the overhead size of the index, and new rows will always go on the end: no need to split pages, and less rebalancing of the index. Furthermore, I prefer to keep my rows as small as possible (smaller rows means more rows/page, more rows/page means less pages needed, less pages needed means less disk I/O, less disk I/O means better performance). If I can use a tinyint as primary key, I use a tinyint. Else, a mediumint. But even a 4-byte integer is less than an 8 character string. And even using a data column as primary key is acceptable.

The advantage of this approach is that “every integer looks exactly like every other integer, but monikers never, ever will.” If the wrong kind of moniker is in some column, you can immediately detect it by eye. Or, by a query, e.g.:
    select dataset_moniker from datasets where dataset_moniker not lik
+e 'dataset_%'
[download]

How does that rhyme with what you write in your first paragraph: It should be a column whose values are utterly devoid of business or other meaning. If your primary column is utterly devoid of any business or other meaning, you would never use literals in your queries. Your primary columns would only show up in joins. Your example would never be issued.

Comment on Re: An improved technique for database primary keys Download Code

Replies are listed 'Best First'.
Re^2: An improved technique for database primary keys by locked_user sundialsvc4 (Abbot) on Nov 05, 2010 at 13:37 UTC
The “literal” content of the moniker is expressly designed to make the string more sensible to humans. If the moniker simply consisted of a random string of characters, then it would effectively be “just like a large integer,” in the sense that, if you accidentally got one in the wrong place, you could never know it nor detect it. As you correctly and astutely observe, my phrase, utterly devoid of ... other meaning, is not strictly true, at least with one particular reading of the English word, “meaning.” The random string, itself, means nothing. The prefix is to be used only as a convention. It would be verboten to write logic that examined the value of the moniker and derived any sort of “actionable understanding” of the record’s content based on any prefix-string found there. That would be an egregious violation of normal-forms. But to make key-strings convey information that is useful for debugging and for automatic data-integrity verification ... that, I submit, is useful. I agree with your point about page-splits. You are absolutely correct that this strategy comes with costs, and this is one of them. (If your database supports hashed, vs. B-tree, indexes, and if you elect to use them due to reasons other than just this, then that cost can be somewhat relieved.)

Replies are listed 'Best First'.

Re^2: An improved technique for database primary keys
by locked_user sundialsvc4 (Abbot) on Nov 05, 2010 at 13:37 UTC

The “literal” content of the moniker is expressly designed to make the string more sensible to humans. If the moniker simply consisted of a random string of characters, then it would effectively be “just like a large integer,” in the sense that, if you accidentally got one in the wrong place, you could never know it nor detect it.

As you correctly and astutely observe, my phrase, utterly devoid of ... other meaning, is not strictly true, at least with one particular reading of the English word, “meaning.” The random string, itself, means nothing. The prefix is to be used only as a convention.

It would be verboten to write logic that examined the value of the moniker and derived any sort of “actionable understanding” of the record’s content based on any prefix-string found there. That would be an egregious violation of normal-forms. But to make key-strings convey information that is useful for debugging and for automatic data-integrity verification ... that, I submit, is useful.

I agree with your point about page-splits. You are absolutely correct that this strategy comes with costs, and this is one of them. (If your database supports hashed, vs. B-tree, indexes, and if you elect to use them due to reasons other than just this, then that cost can be somewhat relieved.)