comment on

The two primary problems with flat file (text) databases are:

Fields are not usually indexed.
If the flat file uses variable length records, updates to existing records that require the record length to change require a re-write of all records below the record being updated.

Fields in a flat file database can be indexed. Either the index can be stored in the flat file database, or in an adjacent file in a known location. However, this is not usually done, as the effort of writing routines to index a flat file database can likely be better spent using an existing binary database format, or toolset.

An additional issue with the requirement that much of the database be re-written if updates are necessary, is the problem of contention. What if multiple processes need to update the database? If process A needs to update the database, it must lock the database, re-write a large section of the database, and unlock the database. Process B needs to lock the database, re-read the database to ensure that it incorporates Process A's change, then re-write a large section of the database including Process B's change, and so on.

In general, flat file databases should only be used if they are very small (I would say less than 100Kbyte, not less than 1Mbyte), and if they only need to be updated by one process, and read by others. The 100Kbyte number is a question of fitting the database content into cache memory, as even having to fallback to RAM can be quite expensive.

I have had a great deal of success with one process writing common data to a text file, and many (500+) clients reading from the text file. The trick here is to ensure that the operation is atomic from the perspective of the client. The client must always see a complete file at the configured location. Also, the client should not rely on the data being fully up-to-date. From a network perspective, for a small amount of data, it is far more efficient to use a network protocol such as NFS to serve a static 'flat file database' to clients, than to serve the data from a generalized database server such as MySQL.

In reply to Re: Where is the bottleneck in flat file databases? by MarkM
in thread Where is the bottleneck in flat file databases? by Wysardry

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.