Corion is correct on all points above, part of the XML spec is that any text between tags eg.
<description><b>Best post ever: </b>This is a super hoopy post froods<
+/description>
Must be rendered XML safe, ie
<description><b>Best post ever: </b>This is a super hoopy
+post froods</description>
This prevents confusion when using XPath tools.
On security, if your users are loading remote data from a session on your service, be very very sure that
- No javascript injection is possible
- You are not revealing session info (HTTP_REFERER)
- No javascript injection is possible
Do not blindly convert the HTML::Entities back to HTML as this may result in execution of malicious code within your users' browsers, while they are logged into your service.
The best way of preventing XSS is with whitelisting of HTML tags and allowed attributes for each tag (consider <b onmouseover="doEvil();">Some text</b> when allowing specific tags) have a look at HTML::Scrubber
The best way of retrieving remote images witout revealing session info is to ensure all such info is in the header rather than URL of requests (POST).
EditAnd another thing about remote images I'd forgotten to mention, some browsers do content sniffing and ignore the alledged nature of the content, Interesting article on the dangers of content sniffing and how to handle |