comment on

The "with Perl" part is not as gratuitous as it might seem, although the question is more db-centric. I am planning a rather large database table that I will query with DBD::Pg. The table itself is rather simple; just about half a dozen, mixed INT and REAL values as shown below

CREATE TABLE t {
    t_id BIGSERIAL NOT NULL,
    year SMALLINT,
    yday, SMALLINT,
    a INT,
    b INT,
    c REAL,
    d INT,
    e REAL,
    f REAL,
    point_id INT
);
[download]

These are daily values per year, and they repeat for each point on land. So, for one point I have one row for each day of the year for 20 years, for the next point I have one row for each day of the year for 20 years, and so on.

The expected table size is going to be in 100 billion rows plus.

A typical search would be to retrieve all the values of a <variable> for or for a particular year+yday combination. In other words, besides the automatic INDEX on t_id, the PRIMARY KEY, there will be INDEXes on year, yday, and point_id. The INDEX on year, yday might be a composite INDEX because they will be searched together.

Anyway, as I said, 100 billion+ rows totaling about 10 TB without the extra INDEXes (that is, the INDEXes other than the PK).

Any suggestions if there are any better ways to do this? Should I try to save space by converting the REALs into INTs? Or, who cares about space... disks are cheap.

Should I combine the different values into a single blob using some kind of data serialization technique and then store them in a single column? To be sure, that sounds like a silly idea to begin with, because I would have to extract the individual variables out of that blob on every query.

Should I partition the table into smaller tables using the technique discussed at http://http://www.postgresql.org/docs/8.4/interactive/ddl-partitioning.html, or is it too much trouble as Pg will handle this data very well anyway.

Once the db is built, it will be relatively static. In other words, it will be readonly.

Fishing for ideas here.

--

when small people start casting long shadows, it is time to go to bed

In reply to Strategy for managing a very large database with Perl by punkish

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.