thanos1983 has asked for the wisdom of the Perl Monks concerning the following question:
Hello everyone,
I build a script that is reading and writing XLSX files customized on my requirements. The script seems to be working fine, but I would appreciate if someone could take a look and suggest any minor/major modifications in order to make it more efficient.
Most people will be thinking make more efficient a script that takes milliseconds to execute does it makes sense? I am trying to optimize the script not only in terms of speed but also regarding resources management. I am reading a file (primary file) that contains almost 4500 lines, secondary file contains approximately 2500 lines. I am comparing them and then the output is printed on a third file.
I am trying to optimize the script because both files are growing day by day and they are not going to get any smaller. I have never worked with so big files so maybe the size in reality is really small and the optimization is not necessary.
Enough said let's go the coding staff. Sample of code below:
#!/usr/bin/perl use utf8; use strict; use warnings; use Data::Dumper; use File::MMagic; use Number::Latin; use Excel::Writer::XLSX; use List::Util qw( first ); use Spreadsheet::ParseXLSX; use Spreadsheet::ParseExcel; die "Usage: $0 Primary (excel sheet) Secondary (excel sheet) Name of n +ew (excel sheet)\n" if (scalar @ARGV != 3); sub _checkFilesExtensions { my ( $fileInput ) = @_; foreach my $file (@$fileInput) { # Match a dot, followed by any number of non-dots until the # end of the line. my ($input, $ext) = split('\.', $file); die "Please use striclty excel files format .xls or .xlsx '$file'\ +n" unless defined $ext; die "Please use striclty excel files format .xls or .xlsx '$file'\ +n" unless ( $ext eq 'xls' or $ext eq 'xlsx' ); } return; } sub _getObjectSpreadSheet { my ( $FileToOpen ) = @_; my $ft = File::MMagic->new(); my $type = $ft->checktype_filename( $FileToOpen ); my $parser; if ($type =~ /zip/){ $parser = Spreadsheet::ParseXLSX->new(); } else { $parser = Spreadsheet::ParseExcel->new(); } my $workbook = $parser->parse( $FileToOpen ); if ( !defined $workbook ) { die $parser->error(), ".\n"; } return $workbook->worksheet( 0 ); } sub _getDataFromSpreadSheet { my ( $worksheet , $RouterIDColumn , $WanInterfaceColumn ) = @_; my ( $row_min, $row_max ) = $worksheet->row_range(); my $name = $worksheet->get_name(); my %HoH; my %hash; foreach my $row ( $row_min .. $row_max ) { my $cellWanInterface = $worksheet->get_cell( $row, $WanInterfaceCo +lumn ); next unless $cellWanInterface; if ($cellWanInterface->value()) { my $cellRouteID = $worksheet->get_cell( $row, $RouterIDColumn +); if ($WanInterfaceColumn) { $hash{$cellRouteID->value()} = $cellWanInterface->value(); $row = $row+1; # Plus one because row starts from 0 and counti +ng from 1 $HoH{uc int2latin($RouterIDColumn +1).$row} = { 'RouterID' => $cellRouteID->value(), 'Wan Interdace' => $cellWanInterface->value(), }; } else { $hash{$cellWanInterface->value()} = undef; $row = $row+1; # Plus one because row starts from 0 and counti +ng from 1 $HoH{uc int2latin($WanInterfaceColumn +1).$row} = { 'RouterID' => $cellWanInterface->value(), 'Wan Interdace' => undef, }; } } next; } return \%HoH, \%hash; } sub _writeIntoCell { my ( $key, $hashOfHashesWanInterfaceDataValue, $hashTechnicalReportDataValue, $worksheet, $workbook ) = @_; my %formatHash = ( -font => 'Arial', -size => 8, -border => 7, -align => 'center', ); # Add and define a format my $format = $workbook->add_format( %formatHash ); # Write some text $row, $column, $text, $format $worksheet->write( $key, $hashOfHashesWanInterfaceDataValue, $format ); my @characters = split(//, $key); my @columnArray = grep /^[[:alpha:]]+$/, @characters; my @rowArray = grep /^[[:digit:]]+$/, @characters; my $column = join('', @columnArray); my $row = join('', @rowArray); $column++; $key = $column . $row; $worksheet->write( $key, $hashTechnicalReportDataValue, $format ); return; } sub getDataTechicalReport { my ( $FileToOpen ) = @_; my $worksheet = _getObjectSpreadSheet( $FileToOpen ); my $RouterIDColumn = 1; # RouterID my $WanInterfaceColumn = 25; # Wan Interface return _getDataFromSpreadSheet( $worksheet, $RouterIDColumn, $WanInterfaceColumn ); } sub getDataWanInterface { my ( $FileToOpen ) = @_; my $worksheet = _getObjectSpreadSheet( $FileToOpen ); my $RouterIDColumn = undef; # We do not care my $WanInterfaceColumn = 0; # A column return _getDataFromSpreadSheet( $worksheet, $RouterIDColumn, $WanInterfaceColumn ); } sub spreadsheetToWrite { my ( $hashOfHashesWanInterfaceData, $hashTechnicalReportData ) = @_; # Create a new Excel workbook my $workbook = Excel::Writer::XLSX->new( $ARGV[2] ); die "Problems creating new Excel file: $!" unless defined $workboo +k; # Add a worksheet my $worksheet = $workbook->add_worksheet(); foreach my $key ( keys %$hashOfHashesWanInterfaceData ) { my $RouterID = first { /$hashOfHashesWanInterfaceData->{$key}{'RouterID'}/ } keys %$hashTechnicalReportData; if ( $RouterID ) { _writeIntoCell( $key, $hashOfHashesWanInterfaceData->{$key}{'RouterID'}, $hashTechnicalReportData->{$RouterID}, $worksheet, $workbook ); } else { _writeIntoCell( $key, $hashOfHashesWanInterfaceData->{$key}{'RouterID'}, "undef", $worksheet, $workbook ); } #} } my %headerFormatHash = ( -font => 'Calibri', -size => 11, -bg_color => '#09c8eb', -bold => 1, -border => 7, -align => 'center', ); my $headerFormat = $workbook->add_format( %headerFormatHash ); $worksheet->write( 'A1', 'DNS Name', $headerFormat ); $worksheet->write( 'B1', 'WAN Interface', $headerFormat ); return; } #### main() #### _checkFilesExtensions(\@ARGV); my ( $hashOfHashesTechikalReport, $hashTechnicalReport ) = getDataTechicalReport( $ARGV[0] ); # print Dumper $hashOfHashesTechikalReport; # print Dumper %$hashOfHashesTechikalReport{'B1376'}; my @keysTechicalReport = keys %$hashTechnicalReport; # print Dumper \@keysTechicalReport; my ( $hashOfHashesWanInterface, $hashWanInterface ) = getDataWanInterface( $ARGV[1] ); # print Dumper $hashOfHashesWanInterface; # print Dumper $hashWanInterface; # print Dumper %$hashOfHashesWanInterface{'A13'}; my @keysWanInterface = keys %$hashWanInterface; # print Dumper \@keysWanInterface; spreadsheetToWrite( $hashOfHashesWanInterface, $hashTechnicalReport ); __END__ $ perl test.pl prime.xlsx secondary.xlsx sample.xlsx
In order to be able to test / run the script you need to create two files e.g. prime.xlsx and secondary.xlsx. On the prime file you need to add on B column a few lines of data e.g. see below:
B (column) Z (column) Line 1 (or what ever number) node1 GigabitEthernet0/1 Line 2 (or what ever number) node2 GigabitEthernet0/2 Line 3 (or what ever number) node3 GigabitEthernet0/3
On the secondary file you need to have data only in column 1 e.g.:
A (column) node1 node2 node3 node4 (extra on purpose)
The output XLSX file e.g. sample.xlsx should look like:
Column A Column B DNS Name WAN Interface node1 GigabitEthernet0/1 node2 GigabitEthernet0/2 node3 GigabitEthernet0/3 node4 undef
One minor note to make here just in case that someone knows to get this done. I am trying to find a way to modify the cell size on the output file. For example I would like to make the output cells size width 4 cm (for example) and height 3 cm (for example). Any ideas if this is possible or I am just looking for something that it is not yet implemented?
Thanks in advance for time and effort trying to assist me.
BR / Thanos
|
|---|