in reply to Re: Architecture design for full stack development.
in thread Architecture design for full stack development.

There is always a first requirement that comes along for choosing an architectural feature. It is purely a matter of chance what the requirement was. As it happens for this case, where the index is clustered (true for some of the join tables which will grow to billions of records over time) contiguity of values has a direct effect on how well the index performs because of the way clustering works. I already stated in the OP why this prompted an ORM.

One world, one people

  • Comment on Re^2: Architecture design for full stack development.

Replies are listed 'Best First'.
Re^3: Architecture design for full stack development.
by erix (Prior) on Jun 23, 2017 at 16:26 UTC

    CLUSTER dependent on the contiguity of /values/? I have to say, I find this a bit hard to believe. Do you perhaps have any recent postgres-based evidence that explains or supports this idea? Or, if you could perhaps outline an approach that demonstrates it, so that I can implement that, and see how it turns out? Is it measurable at all? I would really like to see this in action.

      I don't and won't have evidence unfortunately, only theory. The theory I am applying is that provided the values are contigious, the mapping used in the btree algorithm can use this to calculate the physical location of the page more quickly than if not and that page seeking will slow down proportionately according to how often it cannot calculate the page location based on the value and has to go walkies looking for it. OK I kind of have evidence because I have subjectively noticed a performance improvement in operating on large tables where the values of the PK are contiguous relative to those where they werent. In fact where PK values were contiguous Postgres crunched through 4 mil records in seconds but slept soundly for same record count but non contiguousPK values. Please don't ask me to submit test results -- I have a system to build!

      One world, one people

        CLUSTER [1] rewrites a table in the order of an index (to wit, the index you pass to the CLUSTER command). Access of singular values does not change; but access of values that are now (after CLUSTER) 'near' each other can become faster (because there is a higher chance they are already read). It has to do with /physical/ nearness (or proximity) of the table data, not of the values.

        At least, that is my theory :)

        Please don't ask me to submit test results -- I have a system to build!

        Fair enough :) But I wouldn't undertake any work on the basis of your theory - I don't think it is valid.

        I'll try to devise a test myself, and see if one of our theories can be falsified.

        [1] CLUSTER command (PostgreSQL docs)