Re^13: Architecture design for full stack development.

... cron or autosys job

In view of the 'billions of records' you mentioned earlier, in combination with a busy site:

Keep in mind that a table is completely blocked for both read and write during a CLUSTER operation. This often simply means that it is not feasible to use CLUSTER (unless you can suffer the 'downtime').

I'd say, just forget about primary key- (or index-) 'fragmentation' as a problem. It is not, really. It should not be an application level concern.

I did a comparison between gapped-values versus contiguous-values tables (starting out with 1 billion rows (=34GB), deleting 90%, then comparing to same-size contiguous-numbered table), and it turns out that indeed, there is no difference in performance.

$ ./testfrag.sh
-- rowcount1  [1_000_000_000]
-- rowcount2  [100000000]
-- gap_size   [10]
-- running tests...
fragmented: average (100 searches):      23.754 ms
contiguous: average (100 searches):      26.553 ms
[download]

(The difference seen here is coincidence -- after another CLUSTER it might be the other way round)

The test was done on my old desktop, and above are cold-cache timings (cached, both go down to ~2.0 ms (again no difference)).

Comment on Re^13: Architecture design for full stack development. Download Code