stonecolddevin has asked for the wisdom of the Perl Monks concerning the following question:

UPDATE 2: can't use DBD::AnyData as I have to use the current modules installed at this time. This project has to be done tonight, so the company I'm doing it for can go live with it tomorrow. Updated code. Suggestions?

Update: per bart's advice, I think I'll try DBD::AnyData, unless someone can suggest a more spreadsheet oriented module, or, of course, it doesn't work.

Hey gang,

I'm attempting to refactor some old code to convert the Excel sheet into an XML file based on a schema/template given to me. I'm getting some weird error messages from Spreadsheet::ParseExcel saying "Character in 'C' format wrapped in pack at C:/.../lib/Spreadsheet/ParseExcel/FmtDefault.pm line 68". Template::Toolkit seems to be doing just fine, as it's not throwing errors, and until Spreadsheet::ParseExcel comes into the picture, writes out the XML file with whatever variables I throw interpolated.

The code is as follows:

(Perl source)

#!perl -w use warnings; use strict; use Template; use Spreadsheet::ParseExcel; use Data::Dumper; ## define the xml template here my $xmltemplate = "learning.xtt"; ## directory to write xml files to my $write_directory = "xml/"; my $rand = rand($$); my $outputname = "$rand.xml"; ## set up the output settings and such my $xmlfile = Template->new({ OUTPUT_PATH => "xml" }) || die $Template +::ERROR; my $oExcel = new Spreadsheet::ParseExcel; my $filename = $ARGV[0]; chomp $filename; unless (-e $filename) { print "Can't find file $filename"; exit ; } ## read in the spread sheet my $oBook = Spreadsheet::ParseExcel::Workbook->Parse($filename); ## begin conversion... my @raw_data = (); # excel data... foreach my $oWkS (@{$oBook->{Worksheet}}) { next unless defined $oWkS->{MinRow} and defined $oWkS->{MaxRow}; for my $iR ($oWkS->{MinRow} .. $oWkS->{MaxRow}) { for my $iC ($oWkS->{MinCol} .. $oWkS->{MaxCol}) { my %raw_data; my $oWkC = $oWkS->{Cells}[$iR][$iC]; next if ! defined $oWkC; my @columns = qw( id category code title group sub_group sequence role_mandato +ry role_recommended role_optional url modality length ); @raw_data{ @columns } = map { $oWkC->Value } @columns; } } } ## process the template print "Converting...\n\n"; print "Writing to file...\n\n"; open FH, ">", "debug.txt"; print FH Dumper (\@raw_data); close FH; #$xmlfile->process( $xmltemplate, # { # xml => \@excel_data # }, # $outputname ## for test purposes #) || die $Template::ERROR; print "Done!\n\n";

(The XML file {learning.xtt})

<?xml version="1.0" encoding="ISO-8859-1" ?> <star:learning-paths xmlns:star="http://schemas.sun.com/star-ns/" xml: +lang="en"> [% FOREACH xml %] <star:course order=""> <star:id>[% id %]</star:id> <star:category>[% category %]</star:category> <star:code>[% code %]</star:code> <star:title>[% title %]</star:title> <star:group>[% group %]</star:group> <star:sub-group>[% sub_group %]</star:sub-group> <star:sequence>[% sequence %]</star:sequence> <star:role-mandatory>[% role_mandatory %]</star:role-mandatory> <star:role-recommended>[% role_recommended %]</star:role-recommended +> <star:role-optional>[% role_optional %]</star:role-optional> <star:url>[% url %]</star:url> <star:modality>[% modality %]</star:modality> <star:length>[% length %]</star:length> </star:course> [% END %] </star:learning-paths>

Thanks in advance all. If the Excel file is needed, please /msg me.

P.S. bart said something aboutXML::Excel, but I'm not sure of the flexibility of this module, as I have to follow this template quite explicitly, and it doesn't seem to have many features at this version.

meh.

Replies are listed 'Best First'.
Re: Excel to XML issue with Spreadsheet::ParseExcel
by bart (Canon) on Jan 17, 2007 at 22:32 UTC
    dhoss you don't appear to have made any progress since last we talk about this, ooh, 12 hours ago? (You had put roughly the same code on your scratchpad.)

    Like I told you then: "You can't just make shit up and expect Perl (or in this case, Template::Toolkit) to know what you mean." You're opening an Excel file, taking a reference to a worksheet, and feed it directly to Template::Toolkit, and expect it to extract what it needs, which I assume to be an AoH. (Actually it's worse still, you're passing a dereferenced (as an array) reference to the worksheet, hence, a list, not a single item.)

    So I did the experiment, I replaced the reference to the worksheet with an array of hashes, and ran the code. And I do end up with an XML file I expected to end up with.

    In order not to duplicate the effort, here's my sample code:

    So that's apparently the format TT2 expects. "All" you still have to do, is actually extract the data out of the Excel file, and turn it into a proper AoH. The "old code" you linked to should work as a good source, you most likely have to process the file row by row, producing a hash per row, and the hash keys should be derived from the column names, either out of the headers (likely row 0, if present), or be hardcoded based on the column number. The cell contents will become the value in the hash.

    Maybe there are easier ways, it could be that the AnyData/DBD::AnyData family of file readers could let you treat the spreadsheet as a database. Does it not? Or else, ODBC surely must be capable of it.

    p.s. Did you think of XML-escaping the strings? I haven't looked, but somehow, I doubt it...
    update I see that you haven't. Well, it may be a finishing touch, but still, it needs to be handled eventually, in order to get a solid product.

      I appreciate the advice bart. I figured it would come down to something like you said, creating the AoH row by row.

      As far as XML-escaping, no, I haven't. The person I'm writing this for didn't mention anything like that, so I'm just going along with said directions.

      meh.
Re: Excel to XML issue with Spreadsheet::ParseExcel
by jmcnamara (Monsignor) on Jan 17, 2007 at 23:59 UTC

    The "Character in 'C' format wrapped in pack" warning is a known Spreadsheet::ParseExcel issue when dealing with Unicode strings.

    It has been fixed in recent versions.

    --
    John.

Re: Excel to XML issue with Spreadsheet::ParseExcel
by eff_i_g (Curate) on Jan 17, 2007 at 22:32 UTC
    The error warning is mentioned in perldiag. Does the Excel file contain Unicode characters?

      I'd assume so eff_i_g. Without delving into the docs, is there a way to get rid of it if it's not fatal?

      meh.
        A Google search brings up a few results dealing with this very module. One from October suggests:
        # If the next tests give warnings like # Character in 'C' format wrapped in pack at # /usr/local/perl/5.8.8/lib/site_perl/5.8.8/Spreadsheet/ParseExcel/ +FmtDefault.pm line 68 # Change C* to U* in line 68
Re: Excel to XML issue with Spreadsheet::ParseExcel
by Tux (Canon) on Jan 18, 2007 at 11:52 UTC
    Did you already try the "C" to "U" fix?
    Did you encode () your data before passing it to XML?
    Did you use Encode::Entities or something, so you're sure, no illegal characters ended up in the XML?

    Frankly, your second update doesn't tell me how far you are, and where the current trouble lies
    Enjoy, Have FUN! H.Merijn

      I changed "C" to "U" and that worked fine. My issues now is that I can't get the row value into it's respective template variable in the template. That is, I can't just SELECT it like I would with SQL.

      meh.