ericlee has asked for the wisdom of the Perl Monks concerning the following question:

I've used Perl on and off for some 9 years now, but have not had the chance to use it together with MySQL until now. I have a script which reads through a flat ASCII database and adds records using INSERT to a MySQL database. The problem is, it misses some of the records (maybe 20% of them). The same records are skipped each time I try to do this. And the records being skipped are not duplicates; they have a unique key which isn't in the MySQL database.

Here are the relevant lines of code in my Perl script:

$db->do("INSERT INTO news (url, header, country, country2, date, priority, userid, source, formoreinfo, language, regreqd, keywords, image, actnowcampaigncode, state) VALUES ('$url', '$header', '$country1', '$country2', $date, $priority, '$correspondent', '$source', '$formoreinfo', '$language', '$regreqd', '$keywords', '$image', '$actnowcampaigncode', '$state')");

The unique key in table 'news' is the first field, url.

I cannot figure out why certain records are being skipped. And looking over and over at the records being skipped, I cannot see how they differ from other records which are being added.

For example, here are three records in the flat ASCII database -- the middle one is skipped, the other two are successfully added:

http://montreal.cbc.ca/regional/servlet/View?filename=qc_hema20030513| +Quebec blood workers threaten to strike|Canada||15-May-2003|2|dblacka +dder|CBC||English|No||| http://www.izf.net/izf/AFP/francais/topics/togo/030513165556.mvvte7lp. +html|Libération d'un journaliste après quatre mois de détention|Togo| +|15-May-2003|2|jacquot|izf||French|No||| http://pei.cbc.ca/regional/servlet/View?filename=pe_uniontalks20030414 +|PEI union and Polar Foods end contract talks|Canada||15-May-2003|2|d +blackadder|CBC||English|No|||

Any ideas? Thanks.

Replies are listed 'Best First'.
Re: Perl DBI adding some, not all, records to MySQL database
by jettero (Monsignor) on Jul 06, 2007 at 10:39 UTC
    jeanluca's tip will probably help, but you might also consider using the or die and bind features as well.

    Some of your inserts may be failing because of meta characters in your field values. Try something more like this:

    my $sth = $dbo->prepare("insert into table(field,field,field) values(? +,?,?)"); my $oth = $dbo->prepare("insert into table set field=?, field=?, field +=?"); # this second form is unique to mysql, but more readable... while( $something ) { $sth->execute( $value, $value, $value ) or die $sth->errstr; }

    -Paul

Re: Perl DBI adding some, not all, records to MySQL database
by jeanluca (Deacon) on Jul 06, 2007 at 10:29 UTC
    Hello

    Turn PrintError on, that might show what goes wrong
    $db = DBI->connect(...... { PrintError => 1} ) ;
    Furthermore you could try to insert that record manually, it will probably tell you what goes wrong!

    LuCa
Re: Perl DBI adding some, not all, records to MySQL database
by naikonta (Curate) on Jul 06, 2007 at 10:45 UTC
    Hi ericlee,

    It seems to me that you're doing the insertion within a loop to extract data from the flat file source and convert them into database via do method. If this is the case, the code you show might incomplete because we need to know how the loop goes. As for the statement itself, there should be nothing that can cause the insert to skip certain records.

    Apart from that, I don't know what modules you're using in your code, I just want to let you know that the common technique is to use DBI module, and use the placeholder facility.

    use strict; use warnings; use DBI; # some info my $driver = 'mysql'; # this requires DBD::mysql my $db_name = 'mydb'; my $db_user = 'user'; my $db_pass = 'pass'; my $dsn = "DBI:$driver:$db_name"; # get the source file from command line my $source_file = shift or die "Please supply source filename\n"; # connect to the db, we rely on the default "on" # of AutoCommit and PrintError my $dbh = DBI->connect($dsn, $db_user, $db_pass, {ShowErrorStatement = +> 1}); die $DBI::errstr unless $dbh; # prepare the statement, the number of question marks # must match the number of column to be filled in my $sql = 'INSERT INTO news (url, header, ...., state) values (?, ?, . +.., ?)'; my $sth = $dbh->prepare($sql); # enter the loop after opening the source file open my $fh, '<', $source_file or die "Can't open $source_ile: $!\n"; my $i = 0; while (<$fh>) { chomp; my @column_data = split /\|/; $sth->execute(@column_data) or die "Failed to insert data at line $. (data: [@column_data]): ", $ +sth->errstr; $i++; } close $fh;

    By using placeholders (the question mark, "?"), we don't have to worry about quoting the values. We simply call the execute method on the prepared statement handler with a bunch of data for the columns. We check for the result each time we execute the statement, and die with error message of the last execution. As we set ShowErrorStatement attribute to true value, the actual SQL statement will be shown as well.

    See the DBI manual for more information, also pay a visit to the Tutorials section on Database Programming.


    Open source softwares? Share and enjoy. Make profit from them if you can. Yet, share and enjoy!

Re: Perl DBI adding some, not all, records to MySQL database
by Moron (Curate) on Jul 06, 2007 at 10:51 UTC
    Nevertheless, if the ASCII data MAY contain duplicates, you have to do something about it. Ny general approach in designing a script for this type of requirement is rather different:

    1) keeps sorted backup of the flat input files used per run

    2) use the unix comm utility (twice per file) to create a forward and backward difference extracts based on the latest ASCII file and the previous one

    3) compare with the previous run and execute batched deletion SQL from the backward differences

    4) and then for the insert case, take advantage of MySql's bulk loader, using Perl to translate the flat format only for the "insert" case.

    This means that updates are translated into a delete plus an insert per case, but overall in this context that performs better anyway. Unchanged records in the flat file are filtered out by comm. This package of measures overall performs many times faster than the more "obvious" per line design and was proved to be the most reliable.

    Update: this is assuming there are no manual changes made in the meantime, in which case you have to back-generate an updated version of the "previous" (now simulated) ASCII file from your own data rather than use the one from the previous run.

    __________________________________________________________________________________

    ^M Free your mind!

Re: Perl DBI adding some, not all, records to MySQL database
by snopal (Pilgrim) on Jul 06, 2007 at 13:58 UTC

    The method you are using to split the fields may be at issue.

    My experience with Text::CSV_XS is that it is necessary to use binary mode or some records will be null when characters in the line are out of ascii range.

    It may also be possible that your EOL character is getting in the way. Properly chomping the lines as you parse them can help alot.

    These suggestions are merely conjecture since we don't have this part of the code to evaluate.

Re: Perl DBI adding some, not all, records to MySQL database
by webfiend (Vicar) on Jul 06, 2007 at 19:20 UTC
    jettero's solution will fix you right up. You are doing a straight string substitution in your SQL, which is bad. Using prepare() and binding your column values will magically take care of things like escaping that pesky single quote in the record being skipped.

      It took a while, but this all eventually worked. The problem was almost certainly the single apostrophe, which I hadn't noticed. The code I'm using now reads:

      my $sth = $db->prepare("insert into news(url, header, country, country +2, date, priority, userid, source, formoreinfo, language, regreqd, ke +ywords, image, actnowcampaigncode, state) values(?,?,?,?,?,?,?,?,?,?, +?,?,?,?,?)"); foreach $record (@records) { $n++; &splitupintofields; if ($priority > 0) { $y++; &convdate; &convlang; $sth->execute($url, $header, $country1, $country2, $date, $pri +ority, $correspondent, $source, $formoreinfo, $language, $regreqd, $k +eywords, $image, $actnowcampaigncode, $state); } } $db->disconnect();