camelreader has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I've recently been moved to a new hosting environment I don't have much control of. Previously I had an rss feed generated by XML::RSS 1.05 my new environment has XML::RSS 1.48 which has resulted in problems with my rss feed.

A minimal test case follows:

use XML::RSS; my $rss = new XML::RSS (version => '1.0', encoding => "UTF-8"); $rss->channel(title => "Test", link => "http://example.com", description => "Test script", ); $rss->add_item(title => "Test", link => "http://example.com/2", description => "<i>Some HTML</i> Some text", ); $rss->{output} = '1.0'; print $rss->as_string;
1.05 results as desired
<?xml version="1.0" encoding="UTF-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" > <channel rdf:about="http://example.com"> <title>Test</title> <link>http://example.com</link> <description>Test script</description> <items> <rdf:Seq> <rdf:li rdf:resource="http://example.com/2" /> </rdf:Seq> </items> </channel> <item rdf:about="http://example.com/2"> <title>Test</title> <link>http://example.com/2</link> <description>&lt;i>Some HTML&lt;/i> Some text</description> </item> </rdf:RDF>
1.48 results with undesired encoding
<code> <?xml version="1.0" encoding="UTF-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" > <channel rdf:about="http://example.com"> <title>Test</title> <link>http://example.com</link> <description>Test script</description> <items> <rdf:Seq> <rdf:li rdf:resource="http://example.com/2" /> </rdf:Seq> </items> </channel> <item rdf:about="http://example.com/2"> <title>Test</title> <link>http://example.com/2</link> <description>&#x3C;i&#x3E;Some HTML&#x3C;/i&#x3E; Some text</descripti +on> </item> </rdf:RDF>

The key difference in the feed is the HTML portion:

<description>&lt;i>Some HTML&lt;/i> Some text</description> vs <description>&#x3C;i&#x3E;Some HTML&#x3C;/i&#x3E; Some text</descripti +on>
Any help getting the 1.48 version to work as intended would be much appreciated, I'm having a difficult time figuring out what option I may need to switch to get it to output as intended.
Thanks much,
camelreader

Replies are listed 'Best First'.
Re: Problems with HTML Encoding in RSS after moving to XML::RSS 1.48
by ikegami (Patriarch) on Oct 22, 2010 at 18:11 UTC

    Those two lines are 100% equivalent. There seems to be a bug in your expectations.

      Okay, if I save the 2nd one to a static RSS file the rendered result in Firefox is equivalent. However when I deliver them via a script with Content-Type application/rss+xml the rendered results in Firefox are not equivalent. The 1.48 version displays the HTML tags as text rather than rendering them. Is the content-type a problem when combined with this encoding?

        I wonder about which renderer you are really talking about. I think Firefox itself doesn't render RSS. Whatever renderer it is, it appears to be buggy. &lt; and &#x3C; are equivalent in XML like 'abc' and "abc" are equivalent in Perl.

        A browser can't tell the difference between a static file and a generated file, so there is another difference. That difference is probably the real problem.

Re: Problems with HTML Encoding in RSS after moving to XML::RSS 1.48
by kcott (Archbishop) on Oct 22, 2010 at 20:03 UTC

    This is an untested suggestion based on XML Predefined Entities which include &lt; and &gt;.

    Change your Perl line

    description => "<i>Some HTML</i> Some text",

    to

    description => '&lt;i&gt;Some HTML&lt;/i&gt; Some text',

    Note: I've also changed the double quotes to single quotes.

    As a further suggestion, not related to your issue here, I'd consider using the em element in favour of the i element. That will make the markup standards-compliant regardless of the version of HTML or XHTML this is ultimately rendered in.

    Additional information update:

    Just an afterthought: W3C provides an RDF Validator which you may find helpful.

    -- Ken

      You'll end up with

      &amp;lt;i&amp;gt;Some HTML&amp;lt;/i&amp;gt; Some text

      I think you're going on the assumption that the module can tell if something's already been encoded or not, but that's impossible to determine.