Have you considered HTML::Entities? Something similar to the following will work if I understand what you want:
use HTML::Entities;
my $html = "bad stuff here";
$html = encode_entities($html, "\x80-\xff");
| [reply] [d/l] |
use strict;
use warnings;
use HTML::FromText;
my $text = '< René & François >';
my $t2h = HTML::FromText->new( );
my $html = $t2h->parse( $text );
print $html;
Output
< René & François > | [reply] [d/l] [select] |
... and if, by chance, you're seeking to clean up user input from the web, please, don't stop with character entities. Read up on untainting for a start, if for no other purpose than to get some flavor of the scope of security issues with un-moderated user input.
Cleaning up html entities -- even as benign as or § (though, in some instances, those could be abused) -- is often a good thing... but depending on the source of the "data" you're seeking "to sanitize," it may be only an inadequate protection.
| [reply] [d/l] [select] |
As I interpret the php docs, it looks like a function in the HTML::Entities library,
encode_entities( $string_to_encode, $unsafe_chars ), does just about the same thing as the php "htmlentities" function (note: I don't "know" php). | [reply] [d/l] |
Someone else mentioned this, but I want to emphasize its importance. If you are handling untrustworthy user input from arbitrary people in the wild outside world, always turn on taint checking. This will not catch every potential security problem, but it will catch more than a few of the subtle ones you're likely to miss otherwise. Use it. It will save your bacon sometimes.
Sanity? Oh, yeah, I've got all kinds of sanity. In fact, I've developed whole new kinds of sanity. Why, I've got so much sanity it's driving me crazy.
| [reply] |
This is a common problem for people with a PHP background attempting to code in Perl.
The philosophy in PHP was to pack as much functionality into the core as possible. Given that it was designed pretty much exclusively as a web application language, there is plenty of support for HTML / HTTP, etc type functionality such as the htmlentities() function. This worked pretty well for PHP, since it meant once it was installed on a webserver, you had most of the tools you needed to write basic web apps.
As Perl is a much more general purpose language, it doesn't make sense to include such functionality in the core. But that doesn't mean it doesn't exist. There is a huge amount of this functionality (and much much more) on CPAN.
However, many PHP programmers might search the core docs of Perl, find that an equivelent function isn't there, and assume it must not exist. Once you get used to looking at CPAN for a solution, you start to realise the full potential of Perl. | [reply] |
And for yet another solution -- if you're using CGI to get the input from the user, it has an escapeHTML function. Also see the autoEscape function to set CGI's behavior.
| [reply] [d/l] [select] |