JulioRD has asked for the wisdom of the Perl Monks concerning the following question:
Hello and thanks in advance for any help. I am a perl amateur and created a script that needs optimization. The script uses a control file containing the accounts that will be processed. Then will go through different .csv files and if it finds one or many matches it will write it to the corresponding worksheets.
The first couple of files created take 1 - 2 minutes then performance degrades and files 7 and 8 are taking 5 minutes and so on. The sample data is about 40K rows per csv and control file. Can you please let me know what optimization could be done to speed up the process or make it more consistent?
Business needWe're trying to migrate data from legacy core banking application to a Cloud Based one. The cloud solution has an interface that uses excel files as a way to import the data. The excel files need to be a certain size or less. We have tested and about 500 accounts in the excel file is a good size to import into the Cloud solution.
use warnings; use strict; use diagnostics; use Excel::Writer::XLSX; use Text::CSV; # Control File my $AccountControlFile = "bamboo_AccountControlFile.txt"; my @limit_accounts_per_file = 500; my @printoutput = 0; # Loan transaction file #my $LoanTransactionsFileName = "bamboo.loan_transactions.csv"; my $LoanTransactionsFileName = "bamboo.loan_transactions.csv"; my $LoanAccountFileName = "bamboo.loan_accounts.csv"; my $LoanAccountPaymentCalendarFileName = "bamboo.repayment_calendar.cs +v"; # csv class initialization my $TransaccionesPrestamoCSV = Text::CSV->new({ sep_char => ',' }); my $CuentasPrestamoCSV = Text::CSV->new({ sep_char => ',' }); my $CalendarioDePagosCSV = Text::CSV->new({ sep_char => ',' }); if ( $printoutput[0] == 1 ) { print "before file handle \n"; } # Use different files # bamboo control file open (my $bambooControlFileHandle, '<:encoding(UTF-8)', $AccountContro +lFile) or die "Could not open file '$AccountControlFile' $!"; # loan transaction open (my $LoanTransactionFileHandle, '<:encoding(UTF-8)', $LoanTransac +tionsFileName) or die "Could not open file '$LoanTransactionsFileName +' $!"; # loan account open (my $LoanAccountFileHandle, '<:encoding(UTF-8)', $LoanAccountFile +Name) or die "Could not open file '$LoanAccountFileName' $!"; # payment calendar open (my $LoanPaymentCalendarFileHandle, '<:encoding(UTF-8)', $LoanAcc +ountPaymentCalendarFileName) or die "Could not open file '$LoanAccoun +tPaymentCalendarFileName' $!"; if ( $printoutput[0] == 1 ) { print "after file handle \n"; } # counters everywhere my @counter_tran_prestamo = 0; my @counter_cuentas_prestamo = 0; my @counter_calendario_pago = 0; my @prev_account = '0'; my @counter_accounts_per_file = 0; my @counter_files = 0; my @found_calendario_pago = 0; my @found_trans = 0; my @file_seek_tran_prestamo; my @file_seek_cuentas_prestamo; my @file_seek_calendario_pago; my $CuentasPrestamoWS; my $TransaccionesPrestamoWS; my $CalendarioPagosWS; my $workbook; while ( my $AccountInProcess = <$bambooControlFileHandle> ) { #if ( $printoutput[0] == 1 ) { print "Account" . $AccountInProcess; #} # Create another excel file if not writing a new file or the max n +umber of accounts has been reached if ( $prev_account[0] eq "0" or $counter_accounts_per_file[0] >= $ +limit_accounts_per_file[0]) { # Another excel file $counter_files[0] += 1; # new excel file name my @filename = 'ExcelDoc' . $counter_files[0] . '.xlsx' ; $workbook = Excel::Writer::XLSX->new( @filename ); # creating the worksheets $CuentasPrestamoWS = $workbook->add_worksheet( 'Cuentas de P +restamo' ); $TransaccionesPrestamoWS = $workbook->add_worksheet( 'Transac +ciones de Prestamo' ); $CalendarioPagosWS = $workbook->add_worksheet( 'Calendario d +e Pagos' ); $counter_accounts_per_file[0] = 0; # first account was processed $prev_account[0] = "1"; # Reset counters $counter_cuentas_prestamo[0] = 0; $counter_tran_prestamo[0] = 0; $counter_calendario_pago[0] = 0; if ( $printoutput[0] == 1 ) { print "File:" . @filename; print "\n"; } } # clear any white space chomp $AccountInProcess; # go through each loan while ( my $row = $CuentasPrestamoCSV->getline( $LoanAccountFileHa +ndle )) { # Found a match for the loan if ( $row->[0] eq $AccountInProcess ) { # write from from csv file to excel $CuentasPrestamoWS->write_row( $counter_cuentas_prestamo[0 +], 0, $row ); # keep track of line to write in excel $counter_cuentas_prestamo[0] += 1; # Found one account to process $counter_accounts_per_file[0] += 1; last; } } # Go throuh all the payment calendar rows while ( my $row = $CalendarioDePagosCSV->getline( $LoanPaymentCale +ndarFileHandle )) { # found a match with the account in process if ( $row->[0] eq $AccountInProcess ) { # write into the excel shee the row $CalendarioPagosWS->write_row( $counter_calendario_pago[0] +, 0, $row); # keep track of the row in the excel sheet $counter_calendario_pago[0] += 1; if ( $printoutput[0] == 1 ) { @found_calendario_pago = 1; print "did find!!!!!!!!"; } # keep track that we found one line $found_calendario_pago[0] = 1; # keep track of line location. Go back a line if we switch + from found to not found. $file_seek_calendario_pago[0] = tell $LoanPaymentCalendarF +ileHandle; } # did not find an account match with the loan payment calendar +. But were we in a block of "found accounts"? elsif ( $found_calendario_pago[0] == 1){ if ( $printoutput[0] == 1 ) { print @found_calendario_pago . "\n"; print "did not find??? \n"; } # go back to the previous line seek $LoanPaymentCalendarFileHandle, -$file_seek_calendari +o_pago[0], 1; # flag off to calendario de pago found $found_calendario_pago[0] = 0; # get out of loop and process the next one last; } } # Go through all the rows for the loan transaction file while (my $row = $TransaccionesPrestamoCSV->getline( $LoanTransact +ionFileHandle ) ) { # if the row of the csv file is the same as the loan account f +rom the control file if ( $row->[0] eq $AccountInProcess ){ #and $row->[0] ne unde +f and $row->[0] ne "\n") { # write row in worksheet $TransaccionesPrestamoWS->write_row( $counter_tran_prestam +o[0], 0, $row ); # move to next row on excel sheet for tran prestamos $counter_tran_prestamo[0] += 1; # found the transaction, next one not found stop the file +handle on that line $found_trans[0] = 1; # File is ordered by account number. Whole block of loan t +ransactions. # need position of filehandle to go back a line if the nex +t line is a different account. $file_seek_tran_prestamo[0] = tell $LoanTransactionFileHan +dle; } # account not found to go back a line elsif ( $found_trans[0] == 1){ # reset flag to not found for next account in control file $found_trans[0]= 0; # go back a line, leave filehandle in same position seek $LoanTransactionFileHandle, -$file_seek_tran_prestamo +[0], 1; # get out of loop last; } # end while loan payments } # End while for excel files } close $LoanTransactionFileHandle; close $LoanAccountFileHandle; print "final";
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Optimization of script
by ww (Archbishop) on Aug 18, 2016 at 15:40 UTC | |
by JulioRD (Initiate) on Aug 18, 2016 at 19:56 UTC | |
|
Re: Optimization of script
by dcmertens (Scribe) on Aug 18, 2016 at 16:00 UTC | |
by dcmertens (Scribe) on Aug 18, 2016 at 18:03 UTC | |
|
Re: Optimization of script
by Cristoforo (Curate) on Aug 18, 2016 at 20:48 UTC | |
|
Re: Optimization of script
by GotToBTru (Prior) on Aug 18, 2016 at 21:37 UTC | |
|
Re: Optimization of script
by Marshall (Canon) on Aug 19, 2016 at 08:49 UTC |