this tool will be useful for a number of DBs, and there is a good chance that there will be pattern matching later as a sort of "pre-filter", and the occasional manipulation of data before insertion.

Even so, it is generally a common feature shared by all the major DB servers that each has an "import-from-data-file" tool or function, which is normally quite easy to apply, and will be highly optimized for loading data into the given server. This will always beat the pants off doing inserts with Perl/DBI on really large data sets, even when you throw in the time it takes to read the man page for the DB's import tool (which shouldn't take more than 15 minutes).

As for any pre-filtering or editing of the data, setting up a perl script to do just this step will generally be a better solution (cleaner, quicker, easier to debug/validate/update), as opposed to having this kind of logic in the same app with the relatively complicated i/o methods you're looking at (along with DBI stuff too). In other words, given a huge CVS CSV stream that isn't exactly what you'd want to feed into the DB, write a script that just creates the exact CVS CSV stream that you do want to import. Then use the DB server's native import tool to load the filtered/edited data file.

If you've got plenty time and CPU cycles to burn, go ahead and experiment with trying to approximate what DB-specific loaders already do, but if you're in a hurry and/or have a lot of different instances (data sets and/or DB servers) to worry about, keep it simple by using the existing tools for the job, and only invent the ones that don't exist yet.

update: In case it helps, bear in mind that it would be both easy and efficient to develop a perl wrapper for each of the DB-native import tools. This might not seem worthwhile until you have at least two different DB servers to work with, but once you see what the similarities and differences are in the DB loader tools, I think you could get a lot of mileage and satisfaction out of "objectifying" and "modularizing" that aspect of the problem -- Perl could provide a handy layer of abstraction to hide the differences among DB loaders, and your development time (and run time) will be much better spent in that direction.


In reply to Re^3: Implementing a buffered read-and-insert algorithm by graff
in thread Implementing a buffered read-and-insert algorithm by radiantmatrix

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.