Mjpaddy has asked for the wisdom of the Perl Monks concerning the following question:

Hello Mongers

I'm attempting to dump records into table from a txt file with validation and substring. This file have 170568 records.

From normal reading files and taking out substring out of each line from file and preparing insert statement to dump records took me arround ~14400 sec i.e. ~4 hr & more

Is there any way to optimized this process like multithreading or something but I never use it before

For time being I not put my data instead I build scenario

Input file is like

00032463000000001000000010000010014110662004071664629003128182 O# 00032463000000002000000010000010014110662004071664729003070909 O# 00032463000000003000000010000010014110662004071664829003861044 O# 00032463000000004000000010000010014110662004071664929003005555 O#

Sample Code

use strict; use warnings; use DBI; my $start = time(); my $data; my ($version,$jobnumber); my $file = "C:/temp/dump.txt"; open (FH, "$file") or die "$!"; while(<FH>){ $data .= $_; } $version = substr($data,0,3); $jobnumber = substr($data,4,5); my $databaseName = "basedb"; my $connectHost = "127.0.0.1"; my $DSN = "DBI:mysql:dbname=$databaseName;host=$conn +ectHost"; my $db_username = 'root'; my $db_password = 'root'; my $dbh = DBI->connect($DSN,$db_username,$db_passwor +d,{RaiseError=>1, PrintWarn=>0,A +utoCommit=>0}) || die "error connecting: $!"; my $dump_sql = "INSERT INTO temp (version,jobnumber) VALUES(?,?);"; my $dump_sth = $dbh->prepare($dump_sql) or die "$DBI::errstr"; eval{ $dump_sth->execute($version,$jobnumber) or die "$DBI::errstr"; $dbh->commit; }; if($@){ $dbh->rollback(); print "$@\n"; } my $end = time(); my $elapsed = $end - $start; print "Job took $elapsed sec to execute.\n";

I think I clear my question Please give me suggestion.

Replies are listed 'Best First'.
Re: How to optimized mysql insert into table
by huck (Prior) on Jan 17, 2017 at 14:44 UTC

    Are you running "$dbh->commit;" after every single insert(execute)? Thats where your problem would be. You need to batch up stuff before the commit. try inserting 1000 records then doing the commit.

Re: How to optimized mysql insert into table
by Corion (Patriarch) on Jan 17, 2017 at 14:41 UTC

    The code you show cannot be used for loading the data from your file, as it has several errors.

    Please show us the actual, reduced code you are using, not something that cannot run without errors.

    For solving your problem, consider looking at the LOAD DATA INFILE command of MySQL.

    As an alternative, look at the execute_for_fetch method of DBI.

Re: How to optimized mysql insert into table
by Marshall (Canon) on Jan 18, 2017 at 10:00 UTC
    Your code is not clear at all.

    This looks like complete nonsense:

    while(<FH>){ $data .= $_; }
    There is no need to concatenate all of the lines into a single string. Process each line of the input file separately because each line of the input file will presumably become a new record in the DB. Please review what substr does. From your substr expressions, I don't see anything changing about version or jobnumber.
    #!/usr/bin/perl use strict; use warnings; my $data=<<EOF; 00032463000000001000000010000010014110662004071664629003128182 O# 00032463000000002000000010000010014110662004071664729003070909 O# 00032463000000003000000010000010014110662004071664829003861044 O# 00032463000000004000000010000010014110662004071664929003005555 O# EOF open (my $IN, '<', \$data) or die "unable to open input file"; while (my $line = <$IN>) { next if $line =~ /^s*$/; #skip blank lines, common infile fault #usually at the beginning or at the end my $version = substr($line,0,3); #substr() from your code... my $jobnumber = substr($line,4,5); print "version=$version jobnumber=$jobnumber\n"; #once the above works, DBI insert code goes here... } __END__ Code above prints: version=000 jobnumber=24630 version=000 jobnumber=24630 version=000 jobnumber=24630 version=000 jobnumber=24630
    As a first step, I would recommend that you write some code that correctly parses your input file and just prints the jobnumber and version for each line. If you cannot extract the version and jobnumber for each line in the file, you are doomed - no amount of DBI help will matter.
Re: How to optimized mysql insert into table
by Mjpaddy (Acolyte) on Jan 17, 2017 at 15:01 UTC

    I edited my code I forgot to check it. This scenario is just one part of my script

    I want suggestion like if we use multithreading how I going to apply in this case to reduce execution time. thats the main goal for me.

    really appreciate any help.

      Where in your program do you loop over your input file?

      You are talking about 100,000 inserts but your program is only ever doing one insert.

      Maybe you can show us a real program that does more than one insert.

      My suggestion for doing more than one insert is to use a loop and to prepare the statement handles outside of the loop and to do the ->commit calls every 1000 inserts or something like that.

      Also, consider using the native data loading functionality of MySQL instead of hand-feeding the data to your database.

        OK I put that part back in:
        use strict; use warnings; use DBI; my $start = time(); my $data; my ($version,$jobnumber); my $file = "C:/temp/dump.txt"; open (FH, "$file") or die "$!"; while(<FH>){ $data .= $_; } $version = substr($data,0,3); $jobnumber = substr($data,4,5); my $databaseName = "basedb"; my $connectHost = "127.0.0.1"; my $DSN = "DBI:mysql:dbname=$databaseName;host=$conn +ectHost"; my $db_username = 'root'; my $db_password = 'root'; my $dbh = DBI->connect($DSN,$db_username,$db_passwor +d,{RaiseError=>1,PrintWarn=>0,AutoCommit=>0}) || die "error connectin +g: $!"; my $dump_sql = "INSERT INTO temp (version,jobnumber) VALUES(?,?);"; my $dump_sth = $dbh->prepare($dump_sql) or die "$DBI::errstr"; eval{ $dump_sth->execute($version,$jobnumber) or die "$DBI::errstr"; $dump_sth->execute($version,$jobnumber) or die "$DBI::errstr"; $dump_sth->execute($version,$jobnumber) or die "$DBI::errstr"; $dump_sth->execute($version,$jobnumber) or die "$DBI::errstr"; $dump_sth->execute($version,$jobnumber) or die "$DBI::errstr"; $dump_sth->execute($version,$jobnumber) or die "$DBI::errstr"; $dump_sth->execute($version,$jobnumber) or die "$DBI::errstr"; $dump_sth->execute($version,$jobnumber) or die "$DBI::errstr"; $dump_sth->execute($version,$jobnumber) or die "$DBI::errstr"; $dump_sth->execute($version,$jobnumber) or die "$DBI::errstr"; ... }
      Please pay attention to what huck and Corion have said. I say similar advice at Re^4: Perl DBI to MySQL LOAD_FILE. Inserting even all 170,568 records as a single transaction is no big deal - that is just fine. A commit operation is expensive, very expensive. When done properly simple single threaded code will take maybe a minute vs 4 hours.

      Update: Performance Benchmark

      To demonstrate my point that running a single transaction for all of the inserts will make a HUGE difference, I offer the following benchmark code that makes a test DB table with 170,568 records.

      I used SQLite so that this can be run by anybody. If anything MySQL will be faster. I am using an ancient WinXP laptop.

      Results:
      Doing a commit after every insert = 4.9 hours! Doing a Single commit after all 170,568 inserts = 3 seconds!!! That is what I call a HUGE difference!

      Since my laptop takes about 5 hours the OP's reported time of 4 hours sounds plausible. My guess is that a table creation like I show below will reduce the OP's execution time from 4 hours to 1-2 seconds. Nothing really fancy required, just some solid code.

      #!/usr/bin/perl use strict; use warnings; use Data::Dumper; use DBI qw(:sql_types); my $dbfile = "./TestingInsert.sqlite"; if (-e $dbfile) {unlink $dbfile or die "Delete of $dbfile failed! $!\n +";} my $start_seconds = time(); my $dbh = DBI->connect("dbi:SQLite:dbname=$dbfile","","",{RaiseError = +> 1}) or die "Couldn't connect to database: " . DBI->errstr; $dbh->do ("CREATE TABLE test ( id integer PRIMARY KEY AUTOINCREMENT, version integer, jobnumber integer ); "); $dbh->do("BEGIN"); #running one transaction is needed for speed!!! import_data(); $dbh->do("COMMIT"); my $end_time = time(); print "Total elasped time: ", $end_time-$start_seconds, "\n"; sub import_data { my $jobnumber = 1; my $add = $dbh->prepare("INSERT INTO test ( version, jobnumber ) VALUES(?,?)"); while ($jobnumber < 170568) { $add->execute($jobnumber, $jobnumber++); #DBI insert } return; } __END__ Using a comitt for each insert: Total elasped time: 17833 => 297 min => 4.9 hours Using a single transaction for 170,568 inserts: Total elasped time: 3 seconds

      Do you want the load to be all or nothing. i.e. if one records fails then rollback all the previous inserts ?

      poj
        Our requirement is that but data be there, but we do not concern what that data is only if there.