in reply to Community Teaching Project II - the Call to Arms

First off, I love this idea and I am eager to help out in any way that I can. That being said, I hope this doesn't come across as a criticism.

Should we scale down the project a little? An "Anything Indexer" faces the problem of normalization. I would assume that the user would enter all of the relevant fields for the data they wish to collect, and then further questions would have to be asked of the user to determine the relationships of those fields. Those relationships would be used by the program to normalize the data. The problem is, I'm not aware that it's possible to generate on-the-fly normalization for complex data. The more information the user wishes to track, the more difficult the normalization becomes.

For instance, assume that we're tracking CDs. Let's say that the user wants to track the record label. This has to go in a separate table as one record label will be on many CDs. This is a one-to-many relationship. This seems fairly straightforward. We create a CD table and a record_label table and the CD table has a field identifying the record_label ID.

What happens when the exact same CD is issued under another label? Then we realize that we have a many-to-many relationship and we should have a junction table tying the CD and record_label fields together. Oops. If the database isn't set up that way in the first place, the user may be forced to create another CD record with a new label name and we wind up with duplicate data in the database and we wind up with the potential for modification anomalies.

That may not be the best example, but the point holds. If we target the Indexer at a specific use, we can address these issues up front.

Update: A modification anomaly (as mentioned above) is where updating data corrupts the integrity. In this case, if we have one CD under two different labels, but the user got the name of the CD wrong, he/she may correct the name, but not realize that this error occurred in two places. We wind up with one CD with two different names.

  • Comment on RE: Community Teaching Project II - the Call to Arms

Replies are listed 'Best First'.
RE: RE: Community Teaching Project II - the Call to Arms
by Ozymandias (Hermit) on Jun 16, 2000 at 00:50 UTC
    Unlike swiftone, I'm not interested in realistic goals. <G>

    Seriously, the points you raise are good ones - but I don't think it's a serious problem. The idea behind the Anything Indexer is that the Indexer itself is just an engine. I'm sure we'll create multiple modules for initial release, and a module RFC-style document; but the module is responsible for those issues. You have to be careful defining your fields and how they inter-relate, but that really shouldn't be much of a problem.

    - Ozymandias

RE:(2) Community Teaching Project II - the Call to Arms
by swiftone (Curate) on Jun 16, 2000 at 00:44 UTC
    What happens when the exact same CD is issued under another label? Then we realize that we have a many-to-many relationship and we should have a junction table tying the CD and record_label fields together. Oops.

    Well then it's time to rerun the data through the Indexer. :) Seriously though, I'm all in favor of realistic goals....but would anything less ambitious be worth doing (one could use any of the dozens of CD systems on Freshmeat.net otherwise)