in reply to Re: Spreadsheet::XLSX returning &lt; &gt; and &amp; instead of < > &
in thread Spreadsheet::XLSX returning &lt; &gt; and &amp; instead of < > &

...rewrite Spreadsheet::XLSX using a sane XML parser, like XML::LibXML.

Work in progress.

  • Comment on Re^2: Spreadsheet::XLSX returning &lt; &gt; and &amp; instead of < > &

Replies are listed 'Best First'.
Re^3: Spreadsheet::XLSX returning &lt; &gt; and &amp; instead of < > &
by psynk (Initiate) on Mar 08, 2013 at 13:10 UTC
    Likely beyond my Perl skill level at the moment, but I'll keep it in mind. Meanwhile, I wrote a small function to fix the data. I also found code strings in the original data that we decided needed fixin', so this function might be all I need at the moment.

    sub FixXML { $parm = $_[0]; $parm =~ s/&amp;/&/g; $parm =~ s/&gt;/>/g; $parm =~ s/&lt;/</g; $parm =~ s/&quot;/"/g; $parm =~ s/&apos;/'/g; $parm =~ s/&#xA;/\n/g; $parm =~ s/&#xa;/\n/g; $parm =~ s/&#xD;/\r/g; $parm =~ s/&#xd;/\r/g; $parm =~ s/&#x9;/\t/g; return($parm); }
      Note that this set of regexes will convert '&amp;lt;' to '<' due to the repeated substitutions. But you could do the &amp; substitution last to avoid that. Percents are also documented as something that gets escaped, so you'll want a substitution for that too.
      Thank you, it worked for me