kshitij has asked for the wisdom of the Perl Monks concerning the following question:

Hi All ,

This is in reference to one requirement in which I need to convert one XLS file into CSV file with the help of Perl script. I have written below stated script . It is doing the job but the problem is the generated CSV file is having mismanaged data since the script is not taking care of Text wrap which was there in the XLS and the long strings of some cells are affecting other cells and CSV file is not getting generated as per my expectation.

#!/usr/bin/perl use Text::CSV_XS; use strict; use Spreadsheet::ParseExcel; my $sourcename = shift @ARGV or die "invocation: $0 <source file>\n"; my $source_excel = new Spreadsheet::ParseExcel; my $source_book = $source_excel->Parse($sourcename) or die "Could not +open source Excel file $sourcename: $!"; my $storage_book; foreach my $source_sheet_number (0 .. $source_book->{SheetCount}-1) { my $source_sheet = $source_book->{Worksheet}[$source_sheet_number]; print "--------- SHEET:", $source_sheet->{Name}, "\n"; next unless defined $source_sheet->{MaxRow}; next unless $source_sheet->{MinRow} <= $source_sheet->{MaxRow}; next unless defined $source_sheet->{MaxCol}; next unless $source_sheet->{MinCol} <= $source_sheet->{MaxCol}; foreach my $row_index ($source_sheet->{MinRow} .. $source_sheet->{Max +Row}) { foreach my $col_index ($source_sheet->{MinCol} .. $source_sheet->{Ma +xCol}) { my $source_cell = $source_sheet->{Cells}[$row_index][$col_index]; if ($source_cell) { #print "( $row_index , $col_index ) =>", $source_cell->Value, ","; print $source_cell->Value, ","; } } print "\n"; } print "done!\n";

Replies are listed 'Best First'.
Re: XLS to CSV file with Text wrap
by Tux (Canon) on Aug 31, 2018 at 06:48 UTC

    Note that a text-wrap in Excel does not mean that the data in the cell will contain new-lines when you read it with Spreadsheet::ParseExcel. It is a display-feature in Excel. When you want the CSV to do something similar, you need to read the column-width and wrap the text yourself with something like Text::Wrap. Note however that the default font in Excel is most likely a variable width font and not a mono-spaced font like on a terminal, so that your wrapped text might wrap on completely different points in the field content.

    corion's solution is already helpful (but it skips undefined fields), and could correctly deal with wrapped text, but if the sheet is huge, you might want to stream

    my $csv = Text::CSV_XS->new ({ binary => 1, eol => "\r\n", auto_diag = +> 1 }); open my $fh, ">", "file.csv" or die "file.csv: $!"; foreach my $row_index ($source_sheet->row_range) { my @row; foreach my $col_index ($source_sheet->col_range) { my $source_cell = $source_sheet->{Cells}[$row_index][$col_inde +x]; my $value; if ($source_cell) { local $Text::Wrap::columns = 20; $value = wrap ("", "", $source_cell->Value); } push @row, $value; } $csv->print ($fh, \@row); } close $fh;

    Enjoy, Have FUN! H.Merijn
      Hi Tux ,

      Thanks for providing the solution. I have tried it out but the generated CSV file is not getting generated properly. It is splitting most of the content in the newlines in the CSV file and coming in a single column .

      Please find my script .
      #!/usr/bin/perl -w #use Text::CSV_XS; use strict; use warnings; use lib '/home/users/kulshrk/perl_module/lib64/perl5'; use Spreadsheet::ParseExcel; use Text::Wrap; use Text::CSV_XS; #use Text::CSV_XS qw( csv ); my $csv = Text::CSV_XS->new ({ binary => 1, eol => "\r\n", auto_diag = +> 1 }); open my $fh, ">", "Project_status_tracking_DFT.csv" or die "Project_st +atus_tracking_DFT.csv: $!"; my $sourcename = shift @ARGV or die "invocation: $0 <source file>\n"; my $source_excel = new Spreadsheet::ParseExcel; my $source_book = $source_excel->Parse($sourcename) or die "Could not +open source Excel file $sourcename: $!"; my $storage_book; #my $sheet = []; foreach my $source_sheet_number (0 .. $source_book->{SheetCount}-1) { # my $source_sheet = $source_book->{Worksheet}[$source_sheet_number]; print "--------- SHEET:", $source_sheet->{Name}, "\n"; next unless defined $source_sheet->{MaxRow}; next unless $source_sheet->{MinRow} <= $source_sheet->{MaxRow}; next unless defined $source_sheet->{MaxCol}; next unless $source_sheet->{MinCol} <= $source_sheet->{MaxCol}; # my $sheet = []; foreach my $row_index ($source_sheet->{MinRow} .. $source_sheet->{Max +Row}) { my @row; foreach my $col_index ($source_sheet->{MinCol} .. $source_sheet->{Ma +xCol}) { my $source_cell = $source_sheet->{Cells}[$row_index][$col_index]; my $value; if ($source_cell) { local $Text::Wrap::columns = 20; $value = wrap ("", "", $source_cell->Value); } push @row , $value; } $csv->print ($fh, \@row); } } close $fh;

      Could you suggest something here ? I played with local $Text::Wrap::columns by changing the width but still issue is there .

      Thanks Kshitij
        Thanks for providing the solution. I have tried it out but the generated CSV file is not getting generated properly. It is splitting most of the content in the newlines in the CSV file and coming in a single column.

        And that is what it should look like when viewing as a plain text file. Now take a closer look at where all the quotes are located. CSV allows embedden newlines, which is what you asked for. CSV doe not support display properties, like OpenOffice, LibreOffice, Excel and other Spreadsheet GUI's do. It is plain text with control characters: EOL/EOR (end-of-line/end-of-record), SEP (separator), QUO (Quotation), and ESC (escape). Read the specs if you do not (yet) understand.

        A CSV parser (also builtin into spreadsheet GUI's) should be able to understand the structure and show you the data as you intended it to be.

        I start to suspect that you do not want CSV, but pre-formatted output in columns. That is not what CSV is for.


        Enjoy, Have FUN! H.Merijn

        Your script appears to be working correctly for me. Try this test program

        #!/usr/bin/perl use strict; use warnings; use Text::Wrap; use Text::CSV_XS; my $csv = Text::CSV_XS->new ( { binary => 1, eol => "\r\n", auto_diag => 1 }); open my $fh, ">", "wraptest.csv" or die "$!"; my @data = ( ['1a this is more than 20 characters', '', '1c and another one that is more than 20 characters'], ['2a this is more than 20 characters', '', '2c and another one that is more than 20 characters'], ); for my $row_index (0..1){ my @row; for my $col_index (0..2){ my $source_cell = $data[$row_index][$col_index]; my $value; if ($source_cell){ local $Text::Wrap::columns = 20; $value = wrap ("", "", $source_cell); } push @row , $value; } $csv->print ($fh, \@row); } close $fh;
        poj
Re: XLS to CSV file with Text wrap
by Corion (Patriarch) on Aug 31, 2018 at 06:14 UTC

    When writing your CSV, you should use Text::CSV_XS instead of using print to write the data.

    My approach to writing a CSV file from data would be to convert the Excel data to an array of arrays and then write that out through the csv function of Text::CSV_XS. See the reply by Tux for better code, but this is the basic idea:

    ... my $sheet = []; foreach my $row_index ($source_sheet->row_range) { my $row = []; foreach my $col_index ($source_sheet->col_range) { my $source_cell = $source_sheet->{Cells}[$row_index][$col_index]; if ($source_cell) { #print "( $row_index , $col_index ) =>", $source_cell->Value, ","; push @$row, $source_cell->Value; } } push @$sheet, $row; } csv( in => $sheet, out => 'file.csv' );

    See the documentation of Text::CSV_XS on the csv function.

      HI Corion ,

      I am using below code as per your inputs but getting an error.

      Error

      -------- SHEET:Projects Undefined subroutine &main::csv called at XLS2CSV.pl line 40.
      My script
      #!/usr/bin/perl -w #use Text::CSV_XS; use strict; use Spreadsheet::ParseExcel; use Text::CSV_XS; my $csv = Text::CSV_XS->new ({ binary => 1, eol => "\r\n", auto_diag = +> 1 }); open my $fh, ">", "file.csv" or die "file.csv: $!"; my $sourcename = shift @ARGV or die "invocation: $0 <source file>\n"; my $source_excel = new Spreadsheet::ParseExcel; my $source_book = $source_excel->Parse($sourcename) or die "Could not +open source Excel file $sourcename: $!"; my $storage_book; foreach my $source_sheet_number (0 .. $source_book->{SheetCount}-1) { my $source_sheet = $source_book->{Worksheet}[$source_sheet_number]; print "--------- SHEET:", $source_sheet->{Name}, "\n"; next unless defined $source_sheet->{MaxRow}; next unless $source_sheet->{MinRow} <= $source_sheet->{MaxRow}; next unless defined $source_sheet->{MaxCol}; next unless $source_sheet->{MinCol} <= $source_sheet->{MaxCol}; my @sheet = (); foreach my $row_index ($source_sheet->{MinRow} .. $source_sheet->{Max +Row}) { my @row= (); foreach my $col_index ($source_sheet->{MinCol} .. $source_sheet->{Ma +xCol}) { my $source_cell = $source_sheet->{Cells}[$row_index][$col_index]; if ($source_cell) { push @row, $source_cell->Value; #print "( $row_index , $col_index ) =>", $source_cell->Value, ","; # print $source_cell->Value, ","; } } push @sheet, @row; } csv-> (in => @sheet, out => 'file.csv'); #print "\n"; }
      Could you provide your valuable inputs ?

        You need to import the csv subroutine from Text::CSV_XS, just like the documentation says.