comment on

The number of columns should not be a problem so don't reject option one out of hand. Serializing data, OTOH, is a problem if you intend to do any searching on the serialized column - you force yourself into a situation in which half your search is a SQL query and half is a perl query and you have to serialize and unserialize at every step. So I'd say option #3 is probably not your best bet.

We don't really have enough information to decide between options 1 and 2 - it really depends on the structure of the data. This is the kind of situation where the Entity-Relationship model can be helpful - what are your entities and how do the relate to each other? In the ER model, columns are attributes of entities so ask yourself if the attributes apply to the entities and base your decision on that. For example if the columns are attributes that apply to all rows, but just happen to be missing, then you want one table. But if some attributes only apply to some rows, you want several tables. If your table is vehicles and you only know the mileage for some vehicles, the mileage attribute applies to all rows but is missing for some. But if some of your vehicles are firetrucks while some are sports cars and an attribute is "ladder_length", that attribute only applies to the firetruck rows and you should have separate tables for firetrucks and other vehicles.

The other thing that ER can help you think about is the retlationship between entities (one-to-many, many-to-many, has-a, is-a, etc.). For example, if your entities are people and you have columns for "speaks_English", "speaks_French", and your missing values indicate people that don't speak one of the languages, that is a many-to-many relationship and you should have a separate lookup table with an id for the person and a single column "language_spoken" with values "French", "English" etc. - in this case a single person can have multiple values for the attribute "language_spoken" so the multiple table approach is indicated. Yes, this will mean more complex queries that require joins, but any decent database will optimize for such queries with indexes and the complexity will relate more to the SQL you need to know than to the database itself, which will actually be less complex when normalized into multiple tables.

Your decision should not usually be based on the absolute number of columns or even the absolute number of tables, rather on the structure of the data itself.

In reply to Re: (OT) Database design: omitted partially filled fields. by jZed
in thread (OT) Database design: omitted partially filled fields. by BUU

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.