comment on

I do this sort of thing quite a lot, but generally using perl/java/javascript as a component within a framework like Maxware DSE or Critical Path IMD. One thing you might want to consider (depending on your speed requirements) is converging the data into a single consolidated location before pushing it out to the various tables in Oracle.

Gotchas to look out for:

Authority of data. If you are writing to a single column or record from several datasources, which one has priority?

Mismatch in column length between data source & data target - get a list of columns for source and target and compare them

Data type mismatches - perhaps a column that has been defined in the target as NUM, but there's some rogue data in the source containing strings, or the target is BOOL and the source is a single character VARCHAR that's been populated with a mixture of Y/y/N/n/0/1 etc.

How to detect and handle deletion of records/column entries

Logging: make sure your day to day logging contains enough information to identify a problem record in the source data.

What happens if your script(s) barf in the middle of the job and end unexpectedly?

Character encoding & international character sets: do you need to handle them, and if so which ones and how

Testing:

I'd suggest using Test modules, and comparing the resulting output data with what you are expecting. Use something like Devel::Cover to make sure your tests cover all possible transformations

If at all possible, take a copy of live data to test against, so when you come to run in live there are no nasty surprises waiting for you. When contructing artificial test data it's always tempting to use 'test user 1', 'test user 2' etc, which may not test every scenario. To give a simple example, testing simple test data could leave character quoting problems waiting if you have live users with names like "O'Connor". Probably the easiest way, since you are replacing an existing procedure, is to take a copy of live source and target data, run your scripts to transform the data, and compare results with live target data.

Data::Sync was written as an alternative to commercial tools that do this sort of job, but it's quite new and might as yet be too basic for your needs.

One trick I've occasionally used for complex transformations is the following:

 read from source to temporary database
 for (defined operations)
 { 
  read from temp database x
  transform
  write to temp database y
 }
 write to target
[download]

Although obviously that has a significant impact on the speed of the whole operation.

--------------------------------------------------------------

"If there is such a phenomenon as absolute evil, it consists in treating another human being as a thing."

John Brunner, "The Shockwave Rider".

In reply to Re: Automating data loading into Oracle by g0n
in thread Automating data loading into Oracle by punkish

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.