in reply to Archiving data in a database
A lot will depend upon which type of database (Berkeley type or RDBMS etc,) and which particular one of which ever type you are using.
With a non-relational DB, there are two approaches:
You would need to do this during a "regular maintainance period" during which the DB is disabled from use. If your DB isn't very big, a few seconds or even minutes at 3.00 am local time (on Sundays?) is probably acceptable for most purposes.
This isn't too hard to do if you are using a tied hash interface to this type of DB. You only need write a fairly simple tied hash proxy module that intercepts and forward each access to the DB and logs those operations (STORE/DELETE etc.) that modify it.
With an RDBMS, most of these will have facilities for logging record changes built in. It is just a case of reading the docs and learning how to enable them (if they are not so by default) and then how to examine and if necessary, recover information from them.
For example: MySQL has the --log-update[=file_name] (now deprecated) and the --log-bin[=file_name] (prefered) options that you can add to the MySqld startup command line. See the docs for more info.
The problem then becomes one of deciding:
This is a mostly a basic question of "How much disc space am I willing to use for this". Though, in some circumstances--legal & financial records etc--the law may dictate how long you must keep such information.
This is much simpler these days as given the low cost of optical storage media (CDs and DVDs), it becomes quite easy and very cheap to simply offload your backups on a regular basis and keep them pretty much forever.
This is one of those questions that you will only be able to answer once you actually have a need for it.
In theory, if you always have the logging enabled from day one of using a new database, and you keep all the logs, it is possible to "replay" the logs from the beginning, in order, on a restored copy of the bare schema and retrace history back to any point.
In practice, the longer the database is in use, the longer it will take to get back to a point in the recent past, and the more likely that a log has been lost or corrupted leaving you dead in the water.
To avoid this, it is best to take regular (incremental) backups of the database itself and archive these as well as the logs. It can save a great deal of time when you need to retrace things to the state a few days ago.
Finally, the most important two things about backups (DB or otherwise) are
Eg. It's no good having a really sophisticated backup naming scheme if a year from now when the guy that invented it has moved on, and you really need to be able to recover the transactions from a month ago last Thursday, that whomever is charged with the task can work out how to do it.
I won't recount the whole story, but I was working for a large UK retailer who was spending large sums of money for tape rotation, off-site storage of multi-generational backup tapes for each of its 300+ stores. The day came that disaster struck and it was necessary to replace a stores hardware and recover the DB from tape and they were blank. All of them. A simple miniscule difference meant that whilst the rigourously tested system worked just fine in the test store environment, it wrote completely useless tapes in the production environment. A sum of £5 million wasted over 18 months was mentioned.
Tests, test harnesses and test environments are fine things to have, but cross verification in a live environment is paramount.
|
|---|