comment on

While more algorithmic rather than Perl specific, I have a programming problem that could use some insight.

I'm storing a set of objects in a database, where the rows are tagged by object id for easy retrieval. The properties themselves may be any kind of textual data from short strings, to numbers to dates to large blocks of text. In order to not be pulling more data than is necessary from the database, not all properties may be pulled for a given object when the object is instantiated in the program. Instead the program will ask for a set of properties it expects to use for the task at hand.

The problem is that later on another part of the program may want other properties that may or may not overlap with the properties that have already been retrieved, and so those properties have to be pulled from the database (without re-pulling the already retrieved properties, of course).

Requesting objects from the database singly, this doesn't pose a problem. But generally objects are pulled from the database in batches to reduce the number of queries, e.g.:


retrieveByID(
    properties => [ 'foo','bar','baz' ],
    ids => [ 4,5,62 ]
);

#later....
retrieveByID(
    properties => [ 'a','b','c' ],
    ids => [ 7,8,89 ]
);

#later...
retrieveByID(
    properties => [ 'foo','c','d' ],
    ids => [ 4,8,102 ]
);
[download]

We don't want to re retrieve the whole object from the database if it already exists in the system, but we do want to get any missing properties without pulling already loaded properties a second time. And since queries are generally time-expensive, we want to do it in as few queries as possible

It occurred to me that what I've got is something of a matrix-structure:

   o1  o2  o3  o4
p1 x       x
p2     x   x
p3     x       x
p4         x
p5 x           x
[download]

Now I know very little about matrix math, and perhaps an answer to my question does not lie there. But given a situation similar to the above, where objects o1-o4 have properties loaded from amongst p1-p5 (at the x's), is there a way to discern a near optimal or optimal number of queries to retrieve all of the missing data while minimising the amount of re-retrieved data? I also need to consider that an algorithm to come up with these queries also needs to account for situations where there are a great many more properties than objects, or vice versa.

The clincher of course is that the time taken to determine an optimal or near-optimal set of queries plus the time to run said queries has to be faster than a full re-retrieval or a one-query-per-id scheme, or it is rendered useless.

In reply to algorithm help for determining efficient db data retrieval by AidanLee

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.