in reply to Re: using stdin with sqlldr
in thread using stdin with sqlldr

Ya loading a flat file with many rows seems to be a good way to use sqlldr. BUT, in some cases we load a file with millions of rows and that still takes sqlldr a moderate amount of time ~ 1-3 hours depending on how much text we insert into clob fields. So we first have to spend the time to write to the flat sqlldr files and then wait while we load them. If I compute the row to load and then use sqlldr to load it using stdin and do this sequentially, that will save me about an hour or two of time off the loading because we don't have to wait for the entire file to load after we are done building it. I would like my application/script to perform in this order: 1. get file with text 2. compute sqlldr values in a string with delimiters 3. load values with sqlldr. Now I would use an insert statement with DBI but we have millions of rows to load at a single time, so an insert statement would take days or weeks with this much data. I'm looking for a way to insert like that only with sqlldr and using STDIN seems like a viable option.

Replies are listed 'Best First'.
Re^3: using stdin with sqlldr
by graff (Chancellor) on Nov 19, 2008 at 02:08 UTC
    ...in some cases we load a file with millions of rows and that still takes sqlldr a moderate amount of time ~ 1-3 hours depending on how much text we insert into clob fields.

    The loading of that quantity into oracle is not going to take any less time by doing it any other way. I can appreciate a desire to speed things up, and it does seem likely that if you avoid writing that quantity of data to disk, but instead pipe it directly sqlldr, you will save the amount of time that it takes to write to and read back from the disk.

    In other words, by streaming data directly into sqlldr, neither the data creation nor the database loading go any faster -- you just eliminate the intermediate delay of waiting till all the data is on disk before loading it.

    That said, you might want to use a "tee" in your pipeline: have the data go to a file while it's also going directly to sqlldr, so you have something to look at and work from if anything goes awry.

    You are certainly right that using DBI "insert" statements would be orders of magnitude slower.