Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!

I am parsing a big xml document that has to go into a database; I have the parsing portions of it working fine, but my question now is on the INSERTING of these values into the database.
I am thinking in do the inserts as I am going through the xml to avoid having to use "PUSH" for each variable, since this xml is about 300 lines long. Is there a way of doing this? Or any better way to do this?
Can someone help me on that? I have some code that would be helpful on the bottom of this if it could be implemented to work with what I have, I assume that it would be fine, any thoughts?
Thank you!

#!/perl/bin/perl use strict; use CGI qw/:standard/; use CGI::Carp qw(fatalsToBrowser); use DBI; use XML::XPath; use XML::XPath::XMLParser; print header(); my ($replication,$part_no,$add_sample_ref,$fname,$carr,$carr_nr,$overl +ay,$billa,$billb,$percent,$rref); my (@replication,@part_no,@add_sample_ref,@fname,@carr,@carr_nr,@overl +ay,@billa,@billb,@percent,@rref); my $xp = XML::XPath->new(filename => 'template.xml'); foreach my $row ($xp->findnodes('/CITY/replication')) { # extracting from the XML $replication = $row->find('//@REFA')->string_value; $part_no = $row->find('Part_No')->string_value; # Start my $nodeset = $row->find('AdCar'); foreach my $node ( $nodeset->get_nodelist ) { $add_sample_ref = $node->find( '@Ref' )->string_value; $fname = $node->find( 'FName')->string_value; push @add_sample_ref,$add_sample_ref; #push @ } #End # Start my $node_carr = $row->find('AbCarr'); foreach my $node ( $node_carr->get_nodelist ) { $carr = $node->find( '@RefD' )->string_value; $carr_nr = $node->find('Lic')->string_value; #push @ #push @ # Start my $node_overlay = $row->find('AbCarr/Warr'); foreach my $node ( $node_overlay->get_nodelist ) { $overlay = $node->find('@VD')->string_value; $billa = $node->find('Billa1')->string_value; $billb = $node->find('Billb2')->string_value; + #push @ #push @ #push @ } # End # Start my $node_percent = $row->find('AbCarr/ADcar'); foreach my $node ( $node_percent->get_nodelist ) { $percent = $node->find('@REFI')->string_value; $rref = $node->find('Carr')->string_value; #push @ #push @ } #End } #End AbCarr push @replication, $replication; } #End replication #Get all results from the parsing and input all of it into a DB. my $table = 'test'; my @fields = qw( replication part_no add_sample_ref fname carr carr_nr + overlay billa billb percent rref ); my $fields = join(', ', @fields); my $values = join(', ', map { $dbh->quote($_) } @all_my_data_here{@fie +lds}); $sql = "INSERT into $table ($fields) values ($values)"; $sth = $dbh->prepare($sql); $sth->execute(); $sth->finish();

Replies are listed 'Best First'.
Re: Inserting Multiple Records in DB.
by doom (Deacon) on Feb 29, 2008 at 22:09 UTC
    I'm not sure I entirely understand your problem... 300 records doesn't actually sound like a lot of data. If that's the scale of what you need to do, it probably won't really matter how you do it, anything that works will work well enough.

    From skimming through your code, I gather you're reading in the data from an xml file, stashing it in arrays, then building up a single INSERT SQL command to put it in the database all in one step. And you're worried about the need to read all of the data into memory? In that case it would be easy enough to do the INSERTs one row at a time. If you're worried about the efficieny of that here are a few thoughts off the top of my head:

    There are usually database-specific tools available to do bulk import of data into a database. Postgresql has a COPY command that imports data from files in a few standard formats: e.g. csv, tab-delimited. There are problems with handling xml in the general case, unfortunately (nested trees of data don't map neatly to tabular formats) but converting the data from an xml format to tab-delimited might be a good way to avoid over-flowing memory or getting bogged down in single row inserts.

    The postgresql documentation also has a page with some hints on efficiently populating a database. And for other databases, it's not difficult to craft a web search to look for the tricks, e.g. "bulk import mysql" and "bulk import mysql xml" both look like they turn up useful information.

    In any case, you might look through that page of postgresql hints, because it recommends a number of tricks that are portable:

    • disable autocommit
    • use "prepare" (which you do)
    • temporarily remove
      • indexes (DROP INDEX)
      • foreign key constraints (ALTER TABLE) -- but: are you sure your data is good enough? Really, really, sure?

Re: Inserting Multiple Records in DB.
by Narveson (Chaplain) on Feb 29, 2008 at 22:04 UTC
    $sql = "INSERT into $table ($fields) values ($values)";

    An INSERT ... VALUES statement inserts only one record per execution. You're collecting your data from a structured file using nested loops, so I doubt that it can be modeled very well with a single database record.

    For efficiency you can prepare an INSERT statement once, using question marks as placeholders for the individual values, then loop through all the records. On each pass through the loop, bind a new set of values to the placeholders, then call execute() on the statement handle.

    In fact you need multiple INSERT statements, one for each destination table.

    You may wish to prepare all your INSERT statements before you begin reading the data, then call bind and execute at all the places you have marked with # push

      An INSERT ... VALUES statement inserts only one record per execution. You're collecting your data from a structured file using nested loops, so I doubt that it can be modeled very well with a single database record.

      This isn't true at all. In standard-SQL, you can insert multiple rows of data with a single INSERT statement, and if you look at what the "joins" are doing in the OP's code, you'll see that that's what he's doing.

      Your other advice sounds about right, though. Using bind parameters (question marks) with prepare is a good habit to get into to guard against SQL-injection attacks.

        In standard-SQL, you can insert multiple rows of data with a single INSERT statement.

        Thank you! This was news to me. I've done my due-diligence Googling now, and I see it's true.

        But when I look at the OP's code, I can't see any syntactic markers for multiple records. The multirow INSERT ... VALUES statement looks like

        INSERT INTO phone_book VALUES ('John Doe', '555-1212'), ('Peter Doe', +'555-2323');

        with a pair of parentheses for each record to be inserted.

        Thanks also to pc88mxer below for the diligent analysis of the parsing routine, confirming that there should be multiple target tables.

      OK, to each foreach loop doing one line at a time with
      $dbh->do("INSERT INTO test (overlay,billa,billb) VALUES ('$overlay','$billa','$billb')");
      is duplicating the records in the database, because of the nesting on the foreach loops.
        Well, you could set-up the database with appropriate uniqueness constraints to prevent you from accidentally inserting duplicate rows...

      Any code sample?
Re: Inserting Multiple Records in DB.
by pc88mxer (Vicar) on Mar 01, 2008 at 17:11 UTC
    Here's how I understand your problem. The XML you are interested in seems to have the following structure:
    • There are many (zero or more) /CITY/replication nodes.
    • Each /CITY/replication node has a replication and a partno as well as many AbCar and AbCarr nodes.
    • Each AbCar node has a Ref and an Fname
    • Each AbCarr node has a RefD, a Lic and many AbCarr/Warr and AbCarr/ADcar nodes.
    • Each AbCarr/Warr node has a VD, a Billa1 and a Billa2.
    • Each AbCarr/ADcar node has a REFI and a Carr

    You want to store this data into a database by 'flattening' it. The pseudo-code would then go something like this:

    for each CITY/replication node: my %v = (); $v{replication} = parsed replication value; $v{partno} = parsed Part_No value; for each AbCar node: my %v1 = %v; $v1{Ref} = parsed ref value; $v1{Fname} = parsed Fname value; # save %v1 here (to a db table or to a list) end for for each AbCarr node: my %v1 = %v; $v1{RefD} = parsed RefD value; $v1{Lic} = parsed Lic value; for each AbCarr/Warr node: my %v2 = %v1; $v2{VD} = parsed VD value; $v2{Billa1} = parsed Billa1 value; $v2{Billa2} = parsed Billa2 value; # save %v2 here end for for each AbCarr/ADCar node: my %v2 = %v1; $v2{REFI} = parsed REFI value; $v2{Carr} = parsed Carr value; # save %v2 here end for end for end for

    You have three kinds of records. so I would expect there to be three different database tables.