TheStig has asked for the wisdom of the Perl Monks concerning the following question:

The title makes this problem seem much easier than it is. I know what I want to do programmatically but I am still new to Perl and putting what I want to do into Perl code is where I am getting hung up. The following code takes an Excel and PDF version of one file and takes what I need and outputs a CSV file.

#!/usr/bin/perl -w use warnings; use strict; use PDF::API2; use CAM::PDF; use CAM::PDF::PageText; use String::Util ':all'; use Spreadsheet::ParseXLSX; use Text::CSV; use Date::Format; use DateTime::Format::DateParse; my $pdfFilename = "<insert PDF file name>"; my $pdf = CAM::PDF->new($pdfFilename); my $text = $pdf->getPageText(1); my $endString = "DELIVERY"; my $headIndex = index($text, $endString); my $newText = substr($text, 0, $headIndex); my $header = trim($newText); my $amdNumStart = "NO. "; my $amdNumEnd = "GAS"; my $amdNumStartIndex = index($header, $amdNumStart)+3; my $amdNumEndIndex = index($header, $amdNumEnd); my $amdNumString = substr($header, $amdNumStartIndex, $amdNumEndIndex +- $amdNumStartIndex); my $amdNum = trim($amdNumString); my $poolNumStart = "30202"; my $poolNumEnd = "TIGER"; my $poolNumStartIndex = index($header, $poolNumStart); my $poolNumEndIndex = index($header, $poolNumEnd); my $poolNumString = substr($header, $poolNumStartIndex, $poolNumEndInd +ex - $poolNumStartIndex); my $poolNum = trim($poolNumString); my $dateStart = "DATED "; my $dateStartIndex = index($header, $dateStart)+6; my $dateEndIndex = index($header, $dateStart); my $dateString = substr($header, $dateStartIndex, $dateEndIndex - $dat +eStartIndex); my $reportDate = trim($dateString); my @date = split(' ', $reportDate); my $month = $date[0]; my $day = $date[1]; my $comma = chop($day); my $year = $date[2]; my $dateJoin = join(' ', $month,$day,$year); my $parsedDate = DateTime::Format::DateParse->parse_datetime($dateJoin +); my $billMonth = $parsedDate->strftime('%m%d%Y'); my $fileName = "<insert Excel file name>"; my $parser = Spreadsheet::ParseXLSX->new(); my $workbook = $parser->parse($fileName); my $outfile = "NMGC_Snapshot_$billMonth.csv"; die $parser->error(), ".\n" if (!defined $workbook); open(my $out, '>', $outfile) or die "Could not open $outfile' $!\n"; my $csv = Text::CSV->new(); my @fields = ('Customer Name','Street Address','City','Station Num','C +ustomer Num','Premise Num','Point MDQ','Priority Class','Rate Schedul +e','Consolidated Station Num'); $csv->combine($amdNum, $poolNum, $billMonth,'','','','','','',''); print $out $csv->string, "\n"; $csv->combine(@fields); print $out $csv->string, "\r\n\r\n"; for my $worksheet($workbook->worksheet(0)){ my($firstRow, $lastRow) = $worksheet->row_range(); for my $row(6 .. $lastRow){ #array of cell values my $RowCellValues = []; for (my $col = 1; $col < 10; $col++) { my $cell = $worksheet->get_cell($row, $col); next unless $cell; push($RowCellValues, $cell->value()); } my $status = $csv->print ($out, $RowCellValues); print $out "\r\n"; } }

Every once in awhile in the file there is a blank line. Once my code hits the blank line it needs to go back up one row and see if it is a consolidated station. If that is true it takes the number from that line and needs to write that number onto the end of every line up to the previous blank line. If the line right above the blank line is not a consolidated station i need to write the station number from each individual account onto the end of each line up to the previous blank line. This first example is a piece of what is currently being output to my CSV file. The first two lines are header lines.

87,30202,05012014,,,,,,, "Customer Name","Street Address",City,"Station Num","Customer Num","Pr +emise Num","Point MDQ","Priority Class","Rate Schedule","Consolidated + Station Num" "Customer1","123 Main St.",City A,48,801,6,5,2,54 "Customer2","123 Main St.","City S",22,614,79,3,2,54 "Customer3","123 Main St.","City S",23,615,95,4,2,54 "Customer4","123 Main St.","City S",24,616,66,1,2,54 "Customer5","123 Main St.","City S",25,617,14,2,2,54 "Customer6","123 Main St.","City S",26,618,22,2,2,54 "CUSTOMER CONSOLIDATED STATION",,,20,,,, "Customer7","123 Main St.",City A,16,305,16,94,1,54 "Customer8","123 Main St.","City P",28,214,43,1,2,54 "Customer9","123 Main St.","City P",29,215,99,143,2,54 "Customer10","123 Main St.","City P",30,216,39,21,2,54 "Customer Consolidated Station",31 "Customer11","123 Main St.",City A,70,902,25,7,2,54 "Customer12","123 Main St.",City A,71,903,72,6,2,54

Here is what I am needing the final output to be. You will see the consolidated station numbers added to the end of each customer record.

87,30202,05012014,,,,,,, "Customer Name","Street Address",City,"Station Num","Customer Num","Pr +emise Num","Point MDQ","Priority Class","Rate Schedule","Consolidated + Station Num" "Customer1","123 Main St.",City A,48,801,6,5,2,54,48 "Customer2","123 Main St.","City S",22,614,79,3,2,54,20 "Customer3","123 Main St.","City S",23,615,95,4,2,54,20 "Customer4","123 Main St.","City S",24,616,66,1,2,54,20 "Customer5","123 Main St.","City S",25,617,14,2,2,54,20 "Customer6","123 Main St.","City S",26,618,22,2,2,54,20 "CUSTOMER CONSOLIDATED STATION",,,20,,,, "Customer7","123 Main St.",City A,16,305,16,94,1,54,16 "Customer8","123 Main St.","City P",28,214,43,1,2,54,31 "Customer9","123 Main St.","City P",29,215,99,143,2,54,31 "Customer10","123 Main St.","City P",30,216,39,21,2,54,31 "Customer Consolidated Station",31 "Customer11","123 Main St.",City A,70,902,25,7,2,54,70 "Customer12","123 Main St.",City A,71,903,72,6,2,54,71

I have a feeling i need to read back up the array that is helping to write the CSV every time i hit a blank line, but I am not sure how to do this. Also if anyone has a better idea that would be very helpful.

Replies are listed 'Best First'.
Re: Reading back up through an array
by wjw (Priest) on Jul 11, 2014 at 16:36 UTC

    As an alternative approach:

    Break problem into per/consolidation station sized chunks..

    • Read rows into an hash where the key is the customer number and the values are the rest of record values(address, city, etc...)
    • Use a regex to determine if the first field is "CUSTOMER CONSOLIDATION STATION" and if so, assign the previous hashes to a hash with a key of conslidation station number.

      You now have a hash of hashes where the outer hash has station numbers and the inner hash has all the individual customers as keys with the rest of their values

    • From there is it a matter of outputting to csv from the hash-of-hashes, which you can find examples of in perldsc.
    Your program gets split into two parts. Open and read the file(s) followed by outputting the h-o-h to csv.

    Assuming your records resemble your output, you will be able to read customer records until you find a consolidation station record at which point the customer records are assigned to the consolidation stations record.

    psudo-code:

    until m/consolidation/ { $cust_hash->{'city'} = "city"; $cust_hash->{'cust_num'} = "cust_num"; ....etc... } $outer-hash->{$consolidation_station} = $cust_hash; #end read loop #output to csv is clearly within your capabilities...
    Hope that is helpful...

    ...the majority is always wrong, and always the last to know about it...

    Insanity: Doing the same thing over and over again and expecting different results...

    A solution is nothing more than a clearly stated problem...otherwise, the problem is not a problem, it is a facct