bcurrens has asked for the wisdom of the Perl Monks concerning the following question:

(<, >, &) are encoded by XML::Writer, however (', ") are not. I need to encode both (', ") and I suspect my current solution is not very good. I have essentially made a placeholder for each character, and, after XML::Writer has rendered the XML I'm running a global search replace so I end of with the desired output.

I should add that if I encode these characters before outputting with XML::Writer I get:

<root> <str1>one&lt;two</str1> <str2>one&amp;two</str2> <str3>two&gt;one</str3> <str4>Caisse D&amp;quot;Eparge</str4> <str5>Caisse D&amp;apos;Eparge</str5> </root>

The desired output is:

<root> <str1>one&lt;two</str1> <str2>one&amp;two</str2> <str3>two&gt;one</str3> <str4>Caisse D&quot;Eparge</str4> <str5>Caisse D&apos;Eparge</str5> </root>

Program

use strict; use warnings; use Cwd; use XML::Writer; use XML::Writer::String; use HTML::Entities; use utf8; use Modern::Perl; my $str1 = qq(one<two); # &lt; (<) my $str2 = qq(one&two); # &amp; (&) my $str3 = qq(two>one); # &gt; (>) my $str4 = qq(Caisse D"Eparge); # &quot; ("); my $str5 = qq(Caisse D'Eparge); # &apos; ('); $str4 =~ s/\x{22}/\#\#\#doublequote\#\#\#/g; $str5 =~ s/\x{27}/\#\#\#apostrophe\#\#\#/g; say "str4: $str4"; say "str5: $str5"; my $file = cwd() . '/test.xml'; my $BOD = XML::Writer::String->new(); my $Writer = XML::Writer->new( OUTPUT => $BOD, DATA_MODE => 1, DATA_INDENT => 2 ); $Writer->xmlDecl("UTF-8"); $Writer->startTag('root'); $Writer->dataElement('str1', $str1); $Writer->dataElement('str2', $str2); $Writer->dataElement('str3', $str3); $Writer->dataElement('str4', $str4); $Writer->dataElement('str5', $str5); $Writer->endTag('root'); $Writer->end(); my $xml = $BOD->value(); $xml =~ s/\#\#\#doublequote\#\#\#/\&quot\;/g; $xml =~ s/\#\#\#apostrophe\#\#\#/\&apos\;/g; my $fh = new IO::File "> $file"; if (defined $fh) { print $fh "$xml"; $fh->close; } exit;

Replies are listed 'Best First'.
Re: How to encode apostrophe and quote using XML::Writer?
by LonelyPilgrim (Beadle) on Feb 09, 2016 at 23:28 UTC
    It seems to work for me -- as far as I understand what you're asking. I took your code, commented out your placeholder substitutions, and ran it, and it outputs an XML file with the apostrophes and quotation marks intact. Any proper handling of UTF-8, which XML::Writer supports, ought not to mess with your character encoding. What do you mean that these characters "are not encoded" by XML::Writer? What OS are you running Perl under, and what is the output you are getting?

      "... and it outputs an XML file with the apostrophes and quotation marks intact."

      But the OP wanted the apostrophes and quotation marks encoded into &quot; and &pos; however.

      The source for XML::Writer has this routine defined:

      sub _escapeLiteral { my $data = $_[0]; if ($data =~ /[\&\<\>\"]/) { $data =~ s/\&/\&amp\;/g; $data =~ s/\</\&lt\;/g; $data =~ s/\>/\&gt\;/g; $data =~ s/\"/\&quot\;/g; } return $data; }
      But i am very unclear on how it is being called, if at all.

      Here is the code that i used:

      use strict; use warnings; use XML::Writer; my %hash = ( str1 => 'one<two', str2 => 'one&two', str3 => 'two>one', str4 => 'Caisse D"Eparge', str5 => q(Caisse D'Eparge), ); my $Writer = XML::Writer->new( OUTPUT => 'self', DATA_MODE => 1, DATA_INDENT => 2, ); $Writer->startTag('root'); $Writer->dataElement( $_, $hash{$_} ) for sort keys %hash; $Writer->endTag('root'); $Writer->end(); print $Writer->to_string;
      And the results:
      <root> <str1>one&lt;two</str1> <str2>one&amp;two</str2> <str3>two&gt;one</str3> <str4>Caisse D"Eparge</str4> <str5>Caisse D'Eparge</str5> </root>
      As you can see, the double quote probably should have been encoded but not the single quote. XML::Writer seems very limited in this regard, at the very least, the documentation is unclear on how to customize your usage of the interface (and the code itself is hard to follow).

      I would recommend using another XML module, perhaps even recommend using JSON instead if possible.

      jeffa

      L-LL-L--L-LL-L--L-LL-L--
      -R--R-RR-R--R-RR-R--R-RR
      B--B--B--B--B--B--B--B--
      H---H---H---H---H---H---
      (the triplet paradiddle with high-hat)