in reply to Re^3: Conditional many to many relationships with Class::DBI
in thread Conditional many to many relationships with Class::DBI

In a perfect world, I would agree with you.
The application layer should be where all changes are handled and tracked, but too often that is not the case.
At my company, the source system has been through several different applications, with data from each still being used by some users. Also, there are folks with direct access to the data (via sql) who perform updates directly without benefit of an application.
That leaves me (and my teammates in the data warehouse) with 2 choices: rely on the change/update dates in the source system and just accept that there are changes we won't track, or use some kind of Changed Data Capture (CDC). We have survived for 3 years on the former, and are now trying to institute the latter.
CDC can take several forms, but the best, in my opinion, is one that uses the database engine itself. The database will generally do logging of transactions, and scanning that transaction log for the tables you are interested in tracking is the best way to capture all changes.
We have gone from full table scans of the last 3 days worth of changes, and still missing a good percentage of them, to capturing 100% of the changes we are interested in and loading 25% of the data we did before.

Plus, we took the burden off the source system and put it on our newer servers.
  • Comment on Re^4: Conditional many to many relationships with Class::DBI