in reply to database design issue: one big table vs hundred of smaller tables

Are you perhaps looking for a data dictionary sort of application, one that tracks the locations, names, and field structures of other tables?

To that end, perhaps the following table structures would help:

DIRPATHS PathID - INT, NOT NULL, AUTOINC PathName - VARCHAR(16) # Adjust to taste ParentDir - INT # PathID of the parent directory, null for / TABLELIST TableID - INT, NOT NULL, AUTOINC TableName - VARCHAR(16) # Adjust to tase PathID - INT # Linking Field pointing to DIRPATHS # Add'l fields, e.g. Description, LastUpdate, BackedUpOn, etc. FIELDNAMES TableID - INT, NOT NULL # Link to match in TABLELIST FieldNo - INT, NOT NULL, # Field Position Name - VARCHAR(32) # Adjust to taste. # Description and other fields as needed ENTITIES LocalID - INT, NOT NULL, AUTOINC EntityName - VARCHAR(32) # Adjust to need/taste EntityType - ENUMERATED SET (Note: I prefer lookup tables): 0 - directory 1 - table 2 - field 3 - Etc EntityID - INT # ID from the appropriate table listed earlier

In this scenario, you let the user search ENTITIES for the stuff they're looking for, use the LinkType field to find out the object that matched, and then take the ID back to the appropriate table. You'll need to customize backtracking code to stream a matching ID back to its original data and then handle that, however, it can be a very flexible scheme.

The main problem is that you essentially force yourself into a maintenance problem should something change, such as the directory structure or the storage scheme. Typically, this is a BAD THING,® as anything that gets hard-coded can make life hellish for your maintenance programmer. I avoid this whenever possible.

Perhaps a better approach is to create one table mapping tables (and paths) to arbitrary ID's and then to create a second table listing all fields of every table mapped in the previous one. The fields are larger, but it's easier to maintain in the long run. If a table gets deleted, you only need to DELETE to sets of data. And so on.

In either case, you'll need multiple queries to get the user to the final entity that matches. However, since you're dealing with subsets after the first one, it shouldn't be that much slower than other approaches.

In the end, it doesn't seem that much different than creating a good search engine1 for a web site. Instead of words in a document, you're trying to capture--and then search--the location of various types of data. The specifics changes, but not perhaps the overall techniques. If we can record it, we can search it.2

As far as interesting ways to do this in Perl, I find that someone has written an interesting column or few. (Note: They may not seem topical at first glance, but I think the techniques may help.)

--f

1 - I do recommend getting the latest examples, though. Things have changed since this was published.

2 - Or, a some guy named Dutch once said, "If it bleeds, we can kill it."

  • Comment on Re: database design issue: one big table vs hundred of smaller tables
  • Download Code