Inserting Multiple Records in DB.

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!

I am parsing a big xml document that has to go into a database; I have the parsing portions of it working fine, but my question now is on the INSERTING of these values into the database.
I am thinking in do the inserts as I am going through the xml to avoid having to use "PUSH" for each variable, since this xml is about 300 lines long. Is there a way of doing this? Or any better way to do this?
Can someone help me on that? I have some code that would be helpful on the bottom of this if it could be implemented to work with what I have, I assume that it would be fine, any thoughts?
Thank you!


#!/perl/bin/perl

use strict;
use CGI qw/:standard/;
use CGI::Carp qw(fatalsToBrowser);
use DBI;
use XML::XPath;
use XML::XPath::XMLParser;

print header();

my ($replication,$part_no,$add_sample_ref,$fname,$carr,$carr_nr,$overl
+ay,$billa,$billb,$percent,$rref);
my (@replication,@part_no,@add_sample_ref,@fname,@carr,@carr_nr,@overl
+ay,@billa,@billb,@percent,@rref);



my $xp = XML::XPath->new(filename => 'template.xml');

foreach my $row ($xp->findnodes('/CITY/replication')) {

 # extracting from the XML 
 $replication = $row->find('//@REFA')->string_value;
 $part_no = $row->find('Part_No')->string_value;

        # Start
        my $nodeset = $row->find('AdCar');
        foreach my $node ( $nodeset->get_nodelist ) {
        
           $add_sample_ref = $node->find( '@Ref' )->string_value;
           $fname = $node->find( 'FName')->string_value;
               
               push @add_sample_ref,$add_sample_ref;
               #push @
               
        } #End

        # Start
        my $node_carr = $row->find('AbCarr');

        foreach my $node ( $node_carr->get_nodelist ) {

               $carr = $node->find( '@RefD' )->string_value;
               $carr_nr = $node->find('Lic')->string_value;
                       
                       #push @
                       #push @
            
                       # Start
                   my $node_overlay = $row->find('AbCarr/Warr');
        
                   foreach my $node ( $node_overlay->get_nodelist ) {
        
                      $overlay = $node->find('@VD')->string_value;    
                      $billa = $node->find('Billa1')->string_value;
                      $billb = $node->find('Billb2')->string_value;   
+          
                                   
                                  #push @
                                  #push @
                                  #push @
                                  
                       } # End
                       
                       
               # Start
               my $node_percent = $row->find('AbCarr/ADcar');

               foreach my $node ( $node_percent->get_nodelist ) {

                  $percent = $node->find('@REFI')->string_value;    
                  $rref = $node->find('Carr')->string_value;
                  
                  #push @
                  #push @
               } #End
            
            



            } #End AbCarr
            
            
push @replication, $replication;
} #End replication



#Get all results from the parsing and input all of it into a DB.

my $table = 'test';
my @fields = qw( replication part_no add_sample_ref fname carr carr_nr
+ overlay billa billb percent rref );

my $fields = join(', ', @fields);

my $values = join(', ', map { $dbh->quote($_) } @all_my_data_here{@fie
+lds});

$sql = "INSERT into $table ($fields) values ($values)";

$sth = $dbh->prepare($sql);
$sth->execute();
$sth->finish();
[download]

Comment on Inserting Multiple Records in DB. Download Code

Replies are listed 'Best First'.
Re: Inserting Multiple Records in DB. by doom (Deacon) on Feb 29, 2008 at 22:09 UTC
I'm not sure I entirely understand your problem... 300 records doesn't actually sound like a lot of data. If that's the scale of what you need to do, it probably won't really matter how you do it, anything that works will work well enough. From skimming through your code, I gather you're reading in the data from an xml file, stashing it in arrays, then building up a single INSERT SQL command to put it in the database all in one step. And you're worried about the need to read all of the data into memory? In that case it would be easy enough to do the INSERTs one row at a time. If you're worried about the efficieny of that here are a few thoughts off the top of my head: There are usually database-specific tools available to do bulk import of data into a database. Postgresql has a COPY command that imports data from files in a few standard formats: e.g. csv, tab-delimited. There are problems with handling xml in the general case, unfortunately (nested trees of data don't map neatly to tabular formats) but converting the data from an xml format to tab-delimited might be a good way to avoid over-flowing memory or getting bogged down in single row inserts. The postgresql documentation also has a page with some hints on efficiently populating a database. And for other databases, it's not difficult to craft a web search to look for the tricks, e.g. "bulk import mysql" and "bulk import mysql xml" both look like they turn up useful information. In any case, you might look through that page of postgresql hints, because it recommends a number of tricks that are portable: disable autocommit use "prepare" (which you do) temporarily remove indexes (DROP INDEX) foreign key constraints (ALTER TABLE) -- but: are you sure your data is good enough? Really, really, sure?	[reply]
Re: Inserting Multiple Records in DB. by Narveson (Chaplain) on Feb 29, 2008 at 22:04 UTC
`$sql = "INSERT into $table ($fields) values ($values)";` An INSERT ... VALUES statement inserts only one record per execution. You're collecting your data from a structured file using nested loops, so I doubt that it can be modeled very well with a single database record. For efficiency you can prepare an INSERT statement once, using question marks as placeholders for the individual values, then loop through all the records. On each pass through the loop, bind a new set of values to the placeholders, then call execute() on the statement handle. In fact you need multiple INSERT statements, one for each destination table. You may wish to prepare all your INSERT statements before you begin reading the data, then call bind and execute at all the places you have marked with `# push`	[reply] [d/l] [select]
Re^2: Inserting Multiple Records in DB. by doom (Deacon) on Feb 29, 2008 at 22:34 UTC
An INSERT ... VALUES statement inserts only one record per execution. You're collecting your data from a structured file using nested loops, so I doubt that it can be modeled very well with a single database record. This isn't true at all. In standard-SQL, you can insert multiple rows of data with a single INSERT statement, and if you look at what the "joins" are doing in the OP's code, you'll see that that's what he's doing. Your other advice sounds about right, though. Using bind parameters (question marks) with prepare is a good habit to get into to guard against SQL-injection attacks.	[reply]
Re^3: Inserting Multiple Records in DB. by Narveson (Chaplain) on Mar 02, 2008 at 02:09 UTC
In standard-SQL, you can insert multiple rows of data with a single INSERT statement. Thank you! This was news to me. I've done my due-diligence Googling now, and I see it's true. But when I look at the OP's code, I can't see any syntactic markers for multiple records. The multirow INSERT ... VALUES statement looks like `INSERT INTO phone_book VALUES ('John Doe', '555-1212'), ('Peter Doe', +'555-2323');` [download] with a pair of parentheses for each record to be inserted. Thanks also to pc88mxer below for the diligent analysis of the parsing routine, confirming that there should be multiple target tables.	[reply] [d/l]
Re^2: Inserting Multiple Records in DB. by Anonymous Monk on Feb 29, 2008 at 22:30 UTC
OK, to each foreach loop doing one line at a time with `$dbh->do("INSERT INTO test (overlay,billa,billb) VALUES ('$overlay','$billa','$billb')");` is duplicating the records in the database, because of the nesting on the foreach loops.	[reply] [d/l]
Re^3: Inserting Multiple Records in DB. by doom (Deacon) on Feb 29, 2008 at 22:36 UTC
Well, you could set-up the database with appropriate uniqueness constraints to prevent you from accidentally inserting duplicate rows...	[reply]
Re^2: Inserting Multiple Records in DB. by Anonymous Monk on Feb 29, 2008 at 22:50 UTC
Any code sample?	[reply]
Re: Inserting Multiple Records in DB. by pc88mxer (Vicar) on Mar 01, 2008 at 17:11 UTC
Here's how I understand your problem. The XML you are interested in seems to have the following structure: There are many (zero or more) `/CITY/replication` nodes. Each `/CITY/replication` node has a `replication` and a `partno` as well as many `AbCar` and `AbCarr` nodes. Each `AbCar` node has a `Ref` and an `Fname` Each `AbCarr` node has a `RefD`, a `Lic` and many `AbCarr/Warr` and `AbCarr/ADcar` nodes. Each `AbCarr/Warr` node has a `VD`, a `Billa1` and a `Billa2`. Each `AbCarr/ADcar` node has a `REFI` and a `Carr` You want to store this data into a database by 'flattening' it. The pseudo-code would then go something like this: for each CITY/replication node: my %v = (); $v{replication} = parsed replication value; $v{partno} = parsed Part_No value; for each AbCar node: my %v1 = %v; $v1{Ref} = parsed ref value; $v1{Fname} = parsed Fname value; # save %v1 here (to a db table or to a list) end for for each AbCarr node: my %v1 = %v; $v1{RefD} = parsed RefD value; $v1{Lic} = parsed Lic value; for each AbCarr/Warr node: my %v2 = %v1; $v2{VD} = parsed VD value; $v2{Billa1} = parsed Billa1 value; $v2{Billa2} = parsed Billa2 value; # save %v2 here end for for each AbCarr/ADCar node: my %v2 = %v1; $v2{REFI} = parsed REFI value; $v2{Carr} = parsed Carr value; # save %v2 here end for end for end for [download] You have three kinds of records. so I would expect there to be three different database tables.	[reply] [d/l] [select]