Re^2: Implementing a buffered read-and-insert algorithm

> The reason is that both your CSV files and you database are on the same disk drive, but at different disk locations

Nope. The CSV files and the OS are on one volume, and the database is on a separate RAID volume set.

> You may also want to see if your chosen database has a "loader" program that is purpose built to quickly import large amounts of data into a database.

That would seem to make the most sense on its face. However, this tool will be useful for a number of DBs, and there is a good chance that there will be pattern matching later as a sort of "pre-filter", and the occasional manipulation of data before insertion.

radiantmatrix
require General::Disclaimer; s//2fde04abe76c036c9074586c1/; while(m/(.)/g){print substr(' ,JPacehklnorstu',hex($1),1)}

Comment on Re^2: Implementing a buffered read-and-insert algorithm

Replies are listed 'Best First'.
Re^3: Implementing a buffered read-and-insert algorithm by graff (Chancellor) on Dec 14, 2004 at 00:57 UTC
this tool will be useful for a number of DBs, and there is a good chance that there will be pattern matching later as a sort of "pre-filter", and the occasional manipulation of data before insertion. Even so, it is generally a common feature shared by all the major DB servers that each has an "import-from-data-file" tool or function, which is normally quite easy to apply, and will be highly optimized for loading data into the given server. This will always beat the pants off doing inserts with Perl/DBI on really large data sets, even when you throw in the time it takes to read the man page for the DB's import tool (which shouldn't take more than 15 minutes). As for any pre-filtering or editing of the data, setting up a perl script to do just this step will generally be a better solution (cleaner, quicker, easier to debug/validate/update), as opposed to having this kind of logic in the same app with the relatively complicated i/o methods you're looking at (along with DBI stuff too). In other words, given a huge ~~CVS~~ CSV stream that isn't exactly what you'd want to feed into the DB, write a script that just creates the exact ~~CVS~~ CSV stream that you do want to import. Then use the DB server's native import tool to load the filtered/edited data file. If you've got plenty time and CPU cycles to burn, go ahead and experiment with trying to approximate what DB-specific loaders already do, but if you're in a hurry and/or have a lot of different instances (data sets and/or DB servers) to worry about, keep it simple by using the existing tools for the job, and only invent the ones that don't exist yet. update: In case it helps, bear in mind that it would be both easy and efficient to develop a perl wrapper for each of the DB-native import tools. This might not seem worthwhile until you have at least two different DB servers to work with, but once you see what the similarities and differences are in the DB loader tools, I think you could get a lot of mileage and satisfaction out of "objectifying" and "modularizing" that aspect of the problem -- Perl could provide a handy layer of abstraction to hide the differences among DB loaders, and your development time (and run time) will be much better spent in that direction.	[reply]

Replies are listed 'Best First'.

Re^3: Implementing a buffered read-and-insert algorithm
by graff (Chancellor) on Dec 14, 2004 at 00:57 UTC

this tool will be useful for a number of DBs, and there is a good chance that there will be pattern matching later as a sort of "pre-filter", and the occasional manipulation of data before insertion.

Even so, it is generally a common feature shared by all the major DB servers that each has an "import-from-data-file" tool or function, which is normally quite easy to apply, and will be highly optimized for loading data into the given server. This will always beat the pants off doing inserts with Perl/DBI on really large data sets, even when you throw in the time it takes to read the man page for the DB's import tool (which shouldn't take more than 15 minutes).

As for any pre-filtering or editing of the data, setting up a perl script to do just this step will generally be a better solution (cleaner, quicker, easier to debug/validate/update), as opposed to having this kind of logic in the same app with the relatively complicated i/o methods you're looking at (along with DBI stuff too). In other words, given a huge ~~CVS~~ CSV stream that isn't exactly what you'd want to feed into the DB, write a script that just creates the exact ~~CVS~~ CSV stream that you do want to import. Then use the DB server's native import tool to load the filtered/edited data file.

If you've got plenty time and CPU cycles to burn, go ahead and experiment with trying to approximate what DB-specific loaders already do, but if you're in a hurry and/or have a lot of different instances (data sets and/or DB servers) to worry about, keep it simple by using the existing tools for the job, and only invent the ones that don't exist yet.

update: In case it helps, bear in mind that it would be both easy and efficient to develop a perl wrapper for each of the DB-native import tools. This might not seem worthwhile until you have at least two different DB servers to work with, but once you see what the similarities and differences are in the DB loader tools, I think you could get a lot of mileage and satisfaction out of "objectifying" and "modularizing" that aspect of the problem -- Perl could provide a handy layer of abstraction to hide the differences among DB loaders, and your development time (and run time) will be much better spent in that direction.

[reply]