Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: Spreadsheet::XLSX returning &lt; &gt; and &amp; instead of < > &

by afoken (Chancellor)
on Mar 07, 2013 at 19:10 UTC ( [id://1022293]=note: print w/replies, xml ) Need Help??


in reply to Spreadsheet::XLSX returning &lt; &gt; and &amp; instead of < > &

Spreadsheet::XLSX (as found in http://search.cpan.org/~dmow/Spreadsheet-XLSX-0.13-withoutworldwriteables/lib/Spreadsheet/XLSX.pm) does not use an XML parser, but instead it messes with regular expressions. The code looks quite scary and seems to be very optimistic about the file format.

Perhaps you should look for a different module, or rewrite Spreadsheet::XLSX using a sane XML parser, like XML::LibXML.

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Replies are listed 'Best First'.
Re^2: Spreadsheet::XLSX returning &lt; &gt; and &amp; instead of < > &
by Tux (Canon) on Mar 09, 2013 at 10:50 UTC

    Part of some $work->project, I wrote PROCURA::XML::Entities. I'm willing to remove the PROCURA:: part and put it on CPAN. (Feel free to nick the code if you prefer, but in that case retain the credentials).

    SYNOPSIS use PROCURA::XML::Entities; my $a = "Read &quot;perlre&quot; for explanation of &apos;&amp +;&apos;"; my $b = decode_entities ($a); # $b will now be q{Read "perlre" for explanation of '&'} $c = encode_entities ($b); # $c should be the same as $a use PROCURA::XML::Entities (); $decoded = PROCURA::XML::Entities::decode ($a); $encoded = PROCURA::XML::Entities::encode ($a);

    Enjoy, Have FUN! H.Merijn
Re^2: Spreadsheet::XLSX returning &lt; &gt; and &amp; instead of < > &
by runrig (Abbot) on Mar 07, 2013 at 21:47 UTC
    ...rewrite Spreadsheet::XLSX using a sane XML parser, like XML::LibXML.

    Work in progress.

      Likely beyond my Perl skill level at the moment, but I'll keep it in mind. Meanwhile, I wrote a small function to fix the data. I also found code strings in the original data that we decided needed fixin', so this function might be all I need at the moment.

      sub FixXML { $parm = $_[0]; $parm =~ s/&amp;/&/g; $parm =~ s/&gt;/>/g; $parm =~ s/&lt;/</g; $parm =~ s/&quot;/"/g; $parm =~ s/&apos;/'/g; $parm =~ s/&#xA;/\n/g; $parm =~ s/&#xa;/\n/g; $parm =~ s/&#xD;/\r/g; $parm =~ s/&#xd;/\r/g; $parm =~ s/&#x9;/\t/g; return($parm); }
        Note that this set of regexes will convert '&amp;lt;' to '<' due to the repeated substitutions. But you could do the &amp; substitution last to avoid that. Percents are also documented as something that gets escaped, so you'll want a substitution for that too.
        Thank you, it worked for me

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1022293]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (7)
As of 2024-03-28 08:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found