AALB has asked for the wisdom of the Perl Monks concerning the following question:
I'm a BI developer working with perl scripts as my ETL - I receive data over email, take the file, parse it and push it into the DB. Most of the files are CSV, but occasionally I have an XLSX file.
I've been using Spreadsheet::XLSX to convert, but I've noticed that the CSV output comes out with the wrong encoding (needs to be UTF8, because accents and foreign languages).
That's the sub I'm using ($input_file is an Excel file), but I keep getting the data with the wrong characters.
WHAT am I missing?
Thanks a lot all!
sub convert_to_csv { my $input_file = $_[0]; my ( $filename, $extension ) = split( '\.', $input_file ); open( format_file, ">:**encoding(utf-8)**", "$filename.csv" ) or d +ie "could not open out file $!\n"; my $excel = Spreadsheet::XLSX->new($input_file); my $line; foreach my $sheet ( @{ $excel->{Worksheet} } ) { #printf( "Sheet: %s\n", $sheet->{Name} ); $sheet->{MaxRow} ||= $sheet->{MinRow}; foreach my $row ( $sheet->{MinRow} .. $sheet->{MaxRow} ) { $sheet->{MaxCol} ||= $sheet->{MinCol}; foreach my $col ( $sheet->{MinCol} .. $sheet->{MaxCol} ) { my $cell = $sheet->{Cells}[$row][$col]; if ($cell) { my $trimcell; $trimcell = $cell->value(); print STDERR "cell: $trimcell\n"; ## Just for the +tests so I don't have to open the file to see if it's ok $trimcell =~ s/^\s+|\s+$//g; ## Just to make sure + I don't have extra spaces $line .= "\"" . $trimcell . "\","; } } chomp($line); if ($line =~ /Grand Total/){} ##customized for the files else { print format_file "$line\n"; $line = ''; } } } close format_file; }
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Converting XLSX to CSV with Perl while maintaining the encoding
by haukex (Archbishop) on Feb 25, 2020 at 08:05 UTC | |
Re: Converting XLSX to CSV with Perl while maintaining the encoding
by Tux (Canon) on Feb 25, 2020 at 13:39 UTC | |
Re: Converting XLSX to CSV with Perl while maintaining the encoding
by IB2017 (Pilgrim) on Feb 25, 2020 at 09:52 UTC |