Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Is there a way before perl outputs a variable to switch out any "&" in a a href tag to "&" that is something like this:
$_all_page_content = qq~All content is in a variable like this...<br> <br> <a href="http://www.somedomain.com/index.cgi?page=home&var=1&no=2&so=f +orth&so=on">~; $_all_page_content =~ s/\&/\&amp;/g;
That way the HTML does not have any errors?

Thank you.

Replies are listed 'Best First'.
Re: Switching out characters inside links
by ikegami (Patriarch) on Nov 26, 2009 at 15:48 UTC
    You need to convert the non-HTML (the url) into HTML *before* inserting it into the document.
    use HTML::Entities qw( encode_entities ); my $url = 'http://www.somedomain.com/index.cgi?page=home&var=1&no=2&so +=forth&so=on'; my $h_url = encode_entities($url); print(qq{...<a href="$h_url">...</a>...};

    Using Template-Toolkit, you'd use something like

    ...<a href="[% url | html %]">...</a>...
Re: Switching out characters inside links
by keszler (Priest) on Nov 26, 2009 at 14:57 UTC
Re: Switching out characters inside links
by ambrus (Abbot) on Nov 26, 2009 at 21:36 UTC

    gmargo pointed to the right direction in his reply:

    $ perl -we 'use CGI; $url = "http://www.somedomain.com/index.cgi?page= +home&var=1&no=2&so=forth&so=on"; print qq[<a href="], CGI::escapeHTML +($url), qq[">description</a>\n];' <a href="http://www.somedomain.com/index.cgi?page=home&amp;var=1&amp;n +o=2&amp;so=forth&amp;so=on">description</a>

    It's of course easier to escape the ampersands before you compose the whole text of the page in a variable, because in the latter case it's hard to find out what exactly to escape, and if there are unescaped quotes or angle brackets or ampersands that look like they start valid xml escapes, you might not even be able to parse the results out.

Re: Switching out characters inside links
by Your Mother (Archbishop) on Nov 27, 2009 at 02:07 UTC

    This is totally overkill for what you want but it's a nice approach for converting HTML to XHTML and that's the subtext of your original question.

    use strict; use warnings; use XML::LibXML; my $raw = do { local $/; <DATA> }; my $parser = XML::LibXML->new; $parser->recover_silently(1); my $doc = $parser ->parse_html_string("<div>$raw</div>"); my $wrapper = [ $doc->findnodes("//body/div") ]->[0]; print $_->serialize(1) for $wrapper->childNodes; exit 0; __DATA__ All content is in a variable like this...<br> <br> <a href="http://www.somedomain.com/index.cgi?page=home&var=1&no=2&so=f +orth&so=on">

    You end up with-

    All content is in a variable like this...<br/><br/><a href="http://www.somedomain.com/index.cgi?page=home&amp;var=1&amp;no=2&amp;so=forth&amp;so=on">
      What does this have to do with XML? He specifically mentioned HTML where ampersand needs to be escaped as well. Especially when you consider that the semi-colon is optional in HTML.
      http://www.somedomain.com/index.cgi?page=home&quot=1
      is the same as
      http://www.somedomain.com/index.cgi?page=home"=1

        The OP asked how to get &s to &amp;s in an existing snippet of HTML. My solution does that and it does it without saying start over and do it from scratch which often isn't an option.

        It also sounded like it might be an XY question and the real issue was a need to do HTML->XHTML since the vanilla HTML world isn't usually concerned whether docs validate or not.

Re: Switching out characters inside links
by gmargo (Hermit) on Nov 26, 2009 at 16:59 UTC

    See also the escapeHTML() routine provided by the CGI module.