comment on

What you do about your script keeling over depends on a number of things: what stage it's at when the error occurs, how critical the target data consistency is, whether you're implementing deletions, etc.

Two examples of things that can go horribly, horribly wrong:

You have an application which assumes that field X in a source file (e.g. email address) is unique. Your application therefore uses that as a primary key in your consolidation and transformations. Your application also (for safeties sake) doesn't delete entries that are no longer in the file. The supplier of the file makes a mistake in generation and supplies you with a file with an extra field at the beginning of the record. Bang! Your data is now all over the place (this has happened on several projects I've been on).

Conversely (and this has happened to me several times as well) you have implemented record deletion - if a record is no longer in the file, you delete it. So one day the application generating the source file dies partway through, you get a partial file, and you end up deleting a chunk of the data in the target. The cold thrill you get when you suddenly notice your target database is a tenth the size it should be is quite memorable.

The way I've got around the former in the next version of Data::Sync is to implement a 'validation' method. You can specify a pattern match for every field - before writing, it goes through the candidate record set, checking that the every field matches it's pattern match in every record. If it fails, it doesn't write. Implementing a check like that is one of the reasons I suggested pulling everything together into a single consolidated view before writing to the target(s).

$DBI::AutoCommit=0 might be useful for this - if your script fails anywhere short of the line $dh->commit() it auto rollbacks and nothing is damaged.

Another approach to record deletion/mangling is to allow a configured maximum number of changes in an operation. If the number of changes is greater than x% of the total record set it stops with an error. You may want to implement something similar.

How critical an error is can depend on all sorts of things, but is probably best assessed with a (semi) formal risk assessment. E.g. If an error renders the data in your company phone directory missing or incorrect (I'm using user information as an example because identity management systems are what I'm most familiar with) it makes you look foolish, and causes some annoyance. OTOH, if you push that out to the login database, you suddenly have x% of the company sat twiddling their thumbs, unable to login until you fix it. This costs the company a lot of money, and can cause severe bosses shoe induced bruising on your behind.

There's lots of information around on risk assessment, and really it's the job of the (project) manager, but if you find yourself doing it, you should take into account at least the following:

Likelihood of a problem happening
Probable impact on the system if it does happen
Criticality of the system
Likely time to fix (can be included in impact)
Cost (in terms of extra coding in your case) of mitigating the risk

--------------------------------------------------------------

"If there is such a phenomenon as absolute evil, it consists in treating another human being as a thing."

John Brunner, "The Shockwave Rider".

In reply to Re^3: Automating data loading into Oracle by g0n
in thread Automating data loading into Oracle by punkish

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.