in reply to Re^9: Architecture design for full stack development.
in thread Architecture design for full stack development.

OK I see the problem - the syntax in Postgres doesn't have a CLUSTERED attribute but uses a different syntax for that, namely CLUSTER ON in the create table/alter table syntax. Kind of makes sense in a way, because it affects the whole table persistingly to make an index clustered. The syntax for the current Postgres version for the case of ALTER TABLE can be found in https://www.postgresql.org/docs/9.6/static/sql-altertable.html

In other words you make the table CLUSTER ON <index-name> rather than making the index clustered at index creating time like you do in most other DBMS's, whereas as I said before, the CLUSTER sql command doesn't do that at all - it only works per use without persistence.

So yes, Postgres DOES have clustered indexes, it's just that the documentation isn't as clear as it could be and the syntax is a bit different from other popular DBMS's.

Update: it doesn;t work as expected. The table will have to have CLUSTER applied periodically although it does have a clustered index in the sense of what happens with CLUSTER table (as opposed to CLUSTER index). So that is yet another problem for me to address after all. And you were right that PgAdmin led me to believe it had clustered indexes just like Sybase - it looked that way in terms of how the interface works.

One world, one people

  • Comment on Re^10: Architecture design for full stack development.

Replies are listed 'Best First'.
Re^11: Architecture design for full stack development.
by erix (Prior) on Jun 25, 2017 at 06:30 UTC

    Update: Ah, didn't see your update in time. Good, we seem to (more or less) agree then :) /Update

    No, you are still mistaken (or, just possibly, do not put forward your point clearly enough).

    Let's read what's on the page that you mentioned:

    https://www.postgresql.org/docs/current/static/sql-altertable.html

    It says:

    CLUSTER ON This form selects the default index for future CLUSTER operations. + It does not actually re-cluster the table.

    So it says: 'future CLUSTER operations'. That means the next CLUSTER command. That is the CLUSTER ON command. On the html-page, that CLUSTER is actually a link into sql-cluster.html. This is the CLUSTER I was talking about all along (naturally, as there is no other CLUSTERing in postgres).

    So no, Postgres does not have clustered indexes, except in the sense of the CLUSTER ON command (which re-organises the underlying table on the basis of a /single/ index; and which is non-persistent (as you rightly say))

      Yes and I am grateful to you for putting me right - I could have gone on thinking that the clustered index could be relied upon to persist rather than requiring scheduled cluster operations (e.g. a cron or autosys job) instead.

      One world, one people

        ... cron or autosys job

        In view of the 'billions of records' you mentioned earlier, in combination with a busy site:

        Keep in mind that a table is completely blocked for both read and write during a CLUSTER operation. This often simply means that it is not feasible to use CLUSTER (unless you can suffer the 'downtime').

        I'd say, just forget about primary key- (or index-) 'fragmentation' as a problem. It is not, really. It should not be an application level concern.

        I did a comparison between gapped-values versus contiguous-values tables (starting out with 1 billion rows (=34GB), deleting 90%, then comparing to same-size contiguous-numbered table), and it turns out that indeed, there is no difference in performance.

        $ ./testfrag.sh -- rowcount1 [1_000_000_000] -- rowcount2 [100000000] -- gap_size [10] -- running tests... fragmented: average (100 searches): 23.754 ms contiguous: average (100 searches): 26.553 ms

        (The difference seen here is coincidence -- after another CLUSTER it might be the other way round)

        The test was done on my old desktop, and above are cold-cache timings (cached, both go down to ~2.0 ms (again no difference)).