socrtwo has asked for the wisdom of the Perl Monks concerning the following question:
Anyway I'm using CakeCMD as the unzipper because it tolerates partly corrupt xml files. I'm hoping that none of the modules in this attempt at a script below or any of the code, does not tolerate well formed XML. My real aim is to write a corrupt xlsx2txt converter, but if I can make one work for corrupt files it should work fine for noncorrupt ones right? And that might be useful for others...
Anyway I have simplified the script, commenting out modules stuff and am just trying to make it convert one xlsx to a csv at this point. However, I keep getting this message: "Tk::Error: Can't call method "contents" on an undefined value at line 226". This is the line where $member_workbook is called and it's contents are looked at to see if it follows the 1904 date calendar of Excel.
I got an answer from a Perl instructor who said I was defining the scalar within a local scope above and therefore when the bracket got closed the variable returned to nothing. Here's the code in question:
I really don't how to read a file into a variable, but in my previous script this code snippet that I use worked fine. Also above line 226 in the previous section, I do the same thing without an error, although I was told that what exists in the previous section is an if statement that might not reach my attempted to be defined variable{ local $/=undef; open FILE, "xl/workbook.xml" or die "Couldn't open file: $!"; binmode FILE; my $member_workbook = <FILE>; close FILE; }
Can someone tell me what's going wrong and if there is a better way to read the physical xl/document.xml file into the $member_workbook scalar?
I also contacted the author of Dmitry Ovsyanko, the writer of Spreadsheet::XLSX, and he has been some help, but he may be too advanced for me, and English is his 3rd language so we are having some difficulty communicating. I may hire him to write the script, acting as a patron I guess.
Thanks in advance. Also if you can suggest a shorter way to code the script that would be great too. Unfortunately CakeCMD cannot write to STDIN so I have to create real temporary files consisting of the necessary extracted XML ones.
#!/usr/local/bin/perl # package Spreadsheet::XLSX; # use 5.006000; # use strict; # use warnings; # our @ISA = qw(); # our $VERSION = '0.1'; # use Archive::Zip; use Spreadsheet::XLSX::Fmt2007; use Data::Dumper; use Spreadsheet::ParseExcel; ###################################################################### +########## use Tk; # # Create the Main Window # my $mw = new MainWindow; # # Hides TK logo with my own logo # # my $icon = $mw->Photo(-file => 'xlsx.gif'); # $mw->iconimage($icon); # # Declare that there is a menu, create text # editor and create a vertical scroll bar # my $mbar = $mw -> Menu(); $mw -> configure(-menu => $mbar); # # Main Menu Choices Setup section # my $file = $mbar -> cascade(-label=>"File", -underline=>0, -tearoff => + 0); # my $help = $mbar -> cascade(-label =>"Help", -underline=>0, -tearoff + => 0); # # File Menu Choices section # $file -> checkbutton(-label =>"Open", -underline => 0, -command => [\&menuopenClicked, "Open"]); $file -> command(-label =>"Save", -underline => 0, -command => [\&menusavedClicked, "Save"]); $file -> separator(); $file -> command(-label =>"Exit", -underline => 1, -command => sub { exit } ); # # Help Menu Choices section # # $help -> command(-label =>"About", -command => sub { # $txt->delete('1.0','end'); # $txt->insert('end', # "About # ---------- # How to use this program: # 1. Click on the File Menu and choose Opem. # 2. Choose your xlsx Word 2007 file to which you # wish to convert to csv files. # # # This program is made by Paul D Pruitt (socrtwo) # It also uses Binny V A's Perl/Tk code for the GUI elements # from http://www.geocities.com/binnyva/code. # CakeCMD is by Leung Yat Chun Joseph. # http://www.quickzip.org/softwares-cakecmd # It requires Microsoft .NET Framework Version 2.0 # http://tinyurl.com/ms2-0-netframework # # My software website is # http://www.godskingsandheroes.info/software/. # Also visit my data recovery software list # http://www.s2services.com. # My E-Mail : socrtwo\@s2services"); }); # # Open Dialog Box File Extension Declaration section # # # # Main loop currently activated by selecting the file # MainLoop; sub menuopenClicked { my $typesopen = [ ['Excel 2007 files', '.xlsx'], ['All files', '*'],]; my $mainfilepath = $mw->getOpenFile(-filetypes => $typesop +en, -defaultextension => '.xlsx'); # return if undefined $mainfilepath; # # Old code from package Spreadsheet::XLSX; # sub new { # my ($class, $filename, $converter) = @_; # # my $self = {}; # # Delete old XML data found in the word folder section # use Win32::OLE; $dir = '<xl>'; if(-e $dir){ $Win32::OLE::Warn = 3; # ------ SCRIPT CONFIGURATION ------ $strFolderPath = '<xl>'; # e.g. "d:\temp" # ------ END CONFIGURATION --------- $objFSO = Win32::OLE->new('Scripting.FileSystemObject'); $objFSO->DeleteFolder($strFolderPath); } else { print "\n"; } # # xlsx file rename to zip section necessary for CakeCMD to unzip. # my $zipexcelfilepath = $mainfilepath . '.zip'; rename($mainfilepath,$zipexcelfilepath); # # unzipping here uses cakecmd: http://quickzipdev.googlegroups.com/web +/Cake_Command_Line_0_3.zip # Unzip docx/zip file section # Unzip xl/styles.xml # Unzip xl/workbook.xml # my $unzip = "cakecmd.exe"; open my $wfh, "| $unzip extract \"$zipexcelfilepath\" xl/sharedStrings +.xml \"\" " or die "Could not start $unzip: $!"; open $wfh, "| $unzip extract \"$zipexcelfilepath\" xl/styles.xml \"\" +" or die "Could not start $unzip: $!"; open $wfh, "| $unzip extract \"$zipexcelfilepath\" xl/workbook.xml \"\ +" " or die ("xl/workbook.xml not found in this zip\n"); # # Script sleep section to allow unzipping action to catch up with scri +pt # $num = 2; while($num--){ sleep(1); } close $zipexcelfilepath; # # Read in xl/sharedStrings.xml into $member_shared_strings # { local $/=undef; open FILE, "xl/sharedStrings.xml" or die "Couldn't open file: $!"; binmode FILE; my $member_shared_strings = <FILE>; close FILE; } # # Old code from package Spreadsheet::XLSX; # my $member_shared_strings = $self -> {zip} -> memberNamed ('xl/sh +aredStrings.xml'); # my @shared_strings = (); if ($member_shared_strings) { my $mstr = $member_shared_strings->contents; $mstr =~ s/<t\/>/<t><\/t>/gsm; # this handles an empty t tag +in the xml <t/> foreach my $si ($mstr =~ /<si.*?>(.*?)<\/si/gsm) { my $str; foreach my $t ($si =~ /<t.*?>(.*?)<\/t/gsm) { $t = $converter -> convert ($t) if $converter; $str .= $t; } push @shared_strings, $str; } } # # Read in xl/styles.xml into $member_styles # { local $/=undef; open FILE, "xl/styles.xml" or die "Couldn't open file: $!"; binmode FILE; my $member_styles = <FILE>; close FILE; } # my $member_styles = $self -> {zip} -> memberNamed ('xl/styles. +xml'); my @styles = (); my %style_info = (); if ($member_styles) { foreach my $t ($member_styles -> contents =~ /xf\ numF +mtId="([^"]*)"(?!.*\/cellStyleXfs)/gsm) { #" # $t = $converter -> convert ($t) if $converter +; push @styles, $t; } my $default = $1 || ''; foreach my $t1 (@styles){ $member_styles -> contents =~ /numFmtId="$t1" formatCode=" +([^"]*)/; my $formatCode = $1 || ''; if ($formatCode eq $default || not($formatCode)){ if ($t1 == 9 || $t1==10){ $formatCode="0.00000%";} elsif ($t1 == 14){ $formatCode="m-d-yy";} else { $formatCode=""; } } $style_info{$t1} = $formatCode; $default = $1 || ''; } } # # Old code from package Spreadsheet::XLSX; # my $member_workbook = $self -> {zip} -> memberNamed ('xl/workbook +.xml') or die ("xl/workbook.xml not found in this zip\n"); # # # Read in xl/styles.xml into $member_workbook # { local $/=undef; open FILE, "xl/workbook.xml" or die "Couldn't open file: $!"; binmode FILE; my $member_workbook = <FILE>; close FILE; } # my $oBook = Spreadsheet::ParseExcel::Workbook->new; $oBook->{SheetCount} = 0; $oBook->{FmtClass} = Spreadsheet::XLSX::Fmt2007->new; $oBook->{Flg1904}=0; if ($member_workbook -> contents =~ /date1904="1"/){ $oBook->{Flg1904}=1; } my @Worksheet = (); foreach ($member_workbook -> contents =~ /\<(.*?)\/?\>/g) { /^(\w+)\s+/; my ($tag, $other) = ($1, $'); my @pairs = split /\" /, $other; $tag eq 'sheet' or next; my $sheet = { MaxRow => 0, MaxCol => 0, MinRow => 1000000, MinCol => 1000000, }; foreach ($other =~ /(\S+=".*?")/gsm) { my ($k, $v) = split /=?"/; #" if ($k eq 'name') { $sheet -> {Name} = $v; $sheet -> {Name} = $converter -> convert ($sheet -> {N +ame}) if $converter; } elsif ($k eq 'r:id') { ($sheet -> {Id}) = $v =~ m{rId(\d+)}; }; } my $wsheet = Spreadsheet::ParseExcel::Worksheet->new(%$sheet); push @Worksheet, $wsheet; $oBook->{Worksheet}[$oBook->{SheetCount}] = $wsheet; $oBook->{SheetCount}+=1; } $self -> {Worksheet} = \@Worksheet; foreach my $sheet (@Worksheet) { my $member_name = "xl/worksheets/sheet$sheet->{Id}.xml"; # # Unzip each worksheets # open $wfh, "| $unzip extract \"$zipexcelfilepath\" $member_name \"\" " + or next; $num = 1; while($num--){ sleep(1); } close $zipexcelfilepath; # # Old code from package Spreadsheet::XLSX; # my $member_sheet = $self -> {zip} -> memberNamed ($member_nam +e) or next; # my ($row, $col); my $flag = 0; my $s = 0; my $s2 = 0; my $sty = 0; foreach ($member_sheet -> contents =~ /(\<.*?\/?\>|.*?(?=\<))/ +g) { if (/^\<c r=\"([A-Z])([A-Z]?)(\d+)\"/) { $col = ord ($1) - 65; if ($2) { $col++; $col *= 26; $col += (ord ($2) - 65); } $row = $3 - 1; $s = m/t=\"s\"/ ? 1 : 0; $s2 = m/t=\"str\"/ ? 1 : 0; $sty = m/s="([0-9]+)"/ ? $1 : 0; } elsif (/^<v/) { $flag = 1; } elsif (/^<\/v/) { $flag = 0; } elsif (length ($_) && $flag) { my $v = $s ? $shared_strings [$_] : $_; if ($v eq "</c>"){$v="";} my $type = "Text"; my $thisstyle = ""; if (not($s) && not($s2)){ $type="Numeric"; $thisstyle = $style_info{$styles[$sty]}; if ($thisstyle =~ /(?<!Re)d|m|y/){ $type="Date"; } } $sheet -> {MaxRow} = $row if $sheet -> {MaxRow} < $row +; $sheet -> {MaxCol} = $col if $sheet -> {MaxCol} < $col +; $sheet -> {MinRow} = $row if $sheet -> {MinRow} > $row +; $sheet -> {MinCol} = $col if $sheet -> {MinCol} > $col +; if ($v =~ /(.*)E\-(.*)/gsm && $type eq "Numeric"){ $v=$1/(10**$2); # this handles scientific notatio +n for very small numbers } my $cell =Spreadsheet::ParseExcel::Cell->new( Val => $v, Format => $thisstyle, Type => $type ); $cell->{_Value} = $oBook->{FmtClass}->ValFmt($cell, $o +Book); if ($type eq "Date" && $v<1){ #then this is Excel tim +e field $cell->{Type}="Text"; $cell->{Val}=$cell->{_Value}; } $sheet -> {Cells} [$row] [$col] = $cell; } } $sheet -> {MinRow} = 0 if $sheet -> {MinRow} > $sheet -> {MaxR +ow}; $sheet -> {MinCol} = 0 if $sheet -> {MinCol} > $sheet -> {MaxC +ol}; } foreach my $stys (keys %style_info){ } bless ($self, $class); return $oBook; } # # I don't know what this line does # 1; # # Start of Ken Prows' xls2csv script # # use strict; # use 5.006; # use Getopt::Std; use Locale::Recode; use Spreadsheet::ParseExcel; use Spreadsheet::ParseExcel::FmtUnicode; use Text::CSV_XS; # our $VERSION = '1.06'; # =head1 NAME # xls2csv - A script that recodes a spreadsheet's charset and saves as + CSV. # =head1 DESCRIPTION # This script will recode a spreadsheet into a different character set # and output the recoded data as a csv file. # The script came about after many headaches from dealing with Excel s +preadsheets # from clients that were being received in various character sets. # =head1 OPTIONS # -x : filename of the source spreadsheet # -b : the character set the source spreadsheet is in (before) # -c : the filename to save the generated csv file as # -a : the character set the csv file should be converted to ( +after) # -q : quiet mode # -s : print a list of supported character sets # -h : print help message # -v : get version information # -W : list worksheets in the spreadsheet specified by -x # -w : specify the worksheet name to convert (defaults to the +first worksheet) # =head1 EXAMPLE USAGE # The following example will convert a spreadsheet that is in the WIND +OWS-1252 character set (WinLatin1) # and save it as a csv file in the UTF-8 character set. # xls2csv -x "1252spreadsheet.xls" -b WINDOWS-1252 -c "ut8csvfile. +csv" -a UTF-8 # This example with convert the worksheet named "Users" in the given s +preadsheet. # xls2csv -x "multi_worksheet_spreadsheet.xls" -w "Users" -c "user +s.csv" # =head1 NOTES # The spreadsheet's charset (-b) will default to UTF-8 if not set. # If the csv's charset (-a) is not set, the CSV file will be created u +sing the same charset as the spreadsheet. # =head1 REQUIRED MODULES # This script requires the following modules: # Locale::Recode # Unicode::Map # Spreadsheet::ParseExcel # Spreadsheet::ParseExcel::FmtUnicode (should be included with Sp +readsheet::ParseExcel) # Text::CSV_XS # =head1 CAVEATS # It probably will not work work with spreadsheets that use formulas. # A line in the spreadsheet is assumed to be blank if there is nothing + in the first column. # Some users have reported problems trying to convert a spreadsheet wh +ile it was opened in a different application. # You should probably make sure that no other programs are working wit +h the spreadsheet while you are converting it. # =cut # $Getopt::Std::STANDARD_HELP_VERSION = 1; # my %O; # getopts('x:b:c:a:qshvWw:', \%O); # HELP_MESSAGE() if !%O or $O{'h'}; # VERSION_MESSAGE() if $O{'v'}; # if ($O{'s'}) # { # print "\nThe following character sets are supported:\n\n"; # my $Supported = Locale::Recode->getSupported; # foreach my $CharSet (sort @$Supported) # { # print "$CharSet\n"; # } # print "\n"; # exit; # @ } # my $SourceFilename = $oBook; # $O{'x'} || die "The filename of the spr +eadsheet (-x) is required."; # my $SourceCharset = $O{'b'}; $SourceCharset = 'UTF-8'; # unless $SourceCharset; # #unless ($O{'q'}) #{ print "Now reading \"$SourceFilename\" as $SourceCharset.\n"; #} my $XLS = new IO::File; $XLS->open("< $SourceFilename") || die "Cannot open spreadsheet: $!"; my $Formatter = Spreadsheet::ParseExcel::FmtUnicode->new(Unicode_Map = +> $SourceCharset); my $Book = Spreadsheet::ParseExcel::Workbook->Parse($XLS, $Formatter) +|| die "Can't read spreadsheet!"; # if ($O{'W'}) #{ # print "\nThe following " . ($Book->{SheetCount}>1 ? "$Book->{Shee +tCount} worksheets are" : "worksheet is") . " defined in the spreads +heet:\n\n"; #foreach my $Sheet (@{$Book->{Worksheet}}) # { # print "$Sheet->{Name}\n"; # } # print "\n"; # exit; # } my $DestFilename = $SourceFilename . 'csv'; # $O{'c'} || die "The file +name to save the csv file as (-c) is required."; my $DestCharset = 'UTF-8'; # $O{'a'}; # $DestCharset = $SourceCharset unless $DestCharset; # my $Sheet; # if ($O{'w'}) # { # $Sheet = $Book->Worksheet($O{'w'}); # die "Invalid worksheet" if !defined $Sheet; # unless ($O{'q'}) # { # print qq|Converting the "$Sheet->{Name}" worksheet.\n|; # } #} #else # my $Sheet = @{$Book->{Worksheet}}; # if #(!$O{'q'} && # ($Book->{SheetCount}>1) # { # print qq|Multiple worksheets found. Will convert the "$Sheet- +>{Name}" worksheet.\n|; # } open (CSV, "> $DestFilename") || die "Cannot create csv file: $!" ; binmode CSV; my $Csv = Text::CSV_XS->new({ 'quote_char' => '"', 'escape_char' => '"', 'sep_char' => ',', 'binary' => 1, }); my $Recoder; # if ($O{'a'}) { $Recoder = Locale::Recode->new(from=>$SourceCharset, to=>$DestChar +set); } my $Sheet; for ( my $Row = $Sheet->{MinRow} ; defined $Sheet->{MaxRow} && $Row <= + $Sheet->{MaxRow} ; $Row++ ) { my @Row; for ( my $Col = $Sheet->{MinCol} ; defined $Sheet->{MaxCol} && $Co +l <= $Sheet->{MaxCol} ; $Col++ ) { my $Cell = $Sheet->{Cells}[$Row][$Col]; my $Value = ""; if ($Cell) { $Value = $Cell->Value; if ($Value eq 'GENERAL') { # Sometimes numbers are read incorrectly as "GENERAL". # In this case, the correct value should be in ->{Val} +. $Value = $Cell->{Val}; } # if ($O{'a'}) #{ # $Recoder->recode($Value); #} } # We assume the line is blank if there is nothing in the first + column. last if $Col == $Sheet->{MinCol} and !$Value; push(@Row, $Value); } next unless @Row; my $Status = $Csv->combine(@Row); if (#!$O{'q'} and !defined $Status) { my $Error = $Csv->error_input(); warn "ERROR FOUND!: $Error"; } if (defined $Status) { my $Line = $Csv->string(); print CSV "$Line\n"; } } close CSV; $XLS->close; # unless ($O{'q'}) print "The spreadsheet has been converted to $DestCharset and saved as + \"$DestFilename\".\n"; # sub VERSION_MESSAGE # { # print << "EOF"; # This is xls2csv version $VERSION # Copyright (C) 2005 Ken Prows. All rights reserved. # This script is free software; you can redistribute it and\\or modify + it under the same terms as Perl itself. # For help, use "xls2csv -h" # EOF # exit; # } # #sub HELP_MESSAGE #{ # print << "EOF"; # xls2csv - Recode a spreadsheet's charset and save as CSV. # usage: xls2csv -x spreadsheet.xls [-w worksheet] [-b charset] [-c cs +vfile.csv] [-a charset] [-qshvW] # -x : filename of the source spreadsheet # -b : the character set the source spreadsheet is in (before) # -c : the filename to save the generated csv file as # -a : the character set the csv file should be converted to (after) # -q : quiet mode # -s : print a list of supported character sets # -h : this help message # -v : get version information # -W : list worksheets in the spreadsheet specified by -x # -w : specify the worksheet name to convert (defaults to the first w +orksheet) # example: xls2csv -x "spreadsheet.xls" -b WINDOWS-1252 -c "csvfile.cs +v" -a UTF-8 # More detailed help is in "perldoc xls2csv" # EOF # exit; #} # # =head1 AUTHOR # Ken Prows (perl@xev.net) # =head1 COPYRIGHT # Copyright (C) 2005 Ken Prows. All rights reserved. # This script is free software; you can redistribute it and/or modify +it under the same terms as Perl itself.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Tk::Error: Can't Call Method "contents" on an Undefined Value at...
by GrandFather (Saint) on Apr 09, 2009 at 04:50 UTC |