in reply to Switching out characters inside links

This is totally overkill for what you want but it's a nice approach for converting HTML to XHTML and that's the subtext of your original question.

use strict; use warnings; use XML::LibXML; my $raw = do { local $/; <DATA> }; my $parser = XML::LibXML->new; $parser->recover_silently(1); my $doc = $parser ->parse_html_string("<div>$raw</div>"); my $wrapper = [ $doc->findnodes("//body/div") ]->[0]; print $_->serialize(1) for $wrapper->childNodes; exit 0; __DATA__ All content is in a variable like this...<br> <br> <a href="http://www.somedomain.com/index.cgi?page=home&var=1&no=2&so=f +orth&so=on">

You end up with-

All content is in a variable like this...<br/><br/><a href="http://www.somedomain.com/index.cgi?page=home&amp;var=1&amp;no=2&amp;so=forth&amp;so=on">

Replies are listed 'Best First'.
Re^2: Switching out characters inside links
by ikegami (Patriarch) on Nov 27, 2009 at 02:52 UTC
    What does this have to do with XML? He specifically mentioned HTML where ampersand needs to be escaped as well. Especially when you consider that the semi-colon is optional in HTML.
    http://www.somedomain.com/index.cgi?page=home&quot=1
    is the same as
    http://www.somedomain.com/index.cgi?page=home"=1

      The OP asked how to get &s to &amp;s in an existing snippet of HTML. My solution does that and it does it without saying start over and do it from scratch which often isn't an option.

      It also sounded like it might be an XY question and the real issue was a need to do HTML->XHTML since the vanilla HTML world isn't usually concerned whether docs validate or not.