TrixieTang has asked for the wisdom of the Perl Monks concerning the following question:

How would one go about encoding all HTML entities in a string except for < and >? I've tried using HTML::Entities, but it ends up encoding < and > and breaking the HTML. I've also tried tinkering with the unsafe_characters parameter in HTML::Entities but I still can't seem to get it to allow < and >.

  • Comment on Encoding all HTML entities except < and >

Replies are listed 'Best First'.
Re: Encoding all HTML entities except < and >
by haukex (Archbishop) on Oct 15, 2019 at 19:18 UTC

    That seems like a bit of a strange request to me, could you explain some more what you need this for?

    Anyway, this works for me:

    use HTML::Entities; my $str = "'Hello\" & <World>"; encode_entities($str, q{&"'}); print $str, "\n"; # prints &#39;Hello&quot; &amp; <World>

    But what do you mean by "all HTML entities"? Do you include e.g. non-ASCII characters in that definition? (And what encoding are you using for your HTML files?) Could you show some representative example input and the expected output for that?

      Yeah, I'm not even sure what I was thinking. After thinking more about this, this question now seems incredibly stupid and weird even to me. I was trying to get a module called HTML::WikiConverter to work, but I keep getting either wide character errors or garbled text when trying to use it. I'm not sure what the hell I was even thinking trying to run the string through HTML::Entities first. Looking at the CPAN page for the HTML::WikiConverter module now, I realize that the module has a bunch of reports of UTF-8 issues - which is obviously causing the issues that I've been having.