HTML Entities RegEx

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I need a regex that will replace ampersands that mark the beginning of an HTML entity with &. The catch is that I only want to do so if the HTML entities fall within a TEXTAREA of a FORM. Here is my regex for the basic replacement:

s/&([a-zA-Z]{2,6};|#[0-9]{3};)/&$1/gi;

How can I expand this so that the replacement only happens inside TEXTAREAs?

Comment on HTML Entities RegEx Select or Download Code

Replies are listed 'Best First'.
Re: HTML Entities RegEx by dragonchild (Archbishop) on May 26, 2004 at 18:56 UTC
Use HTML::Parser to find the stuff that falls inside a textarea. Then, apply your regex. (Better is to use the various encode functions that come with CGI and friends). ------ We are the carpenters and bricklayers of the Information Age. Then there are Damian modules.... sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon.* - flyingmoose I shouldn't have to say this, but any code, unless otherwise stated, is untested	[reply]
Re: HTML Entities RegEx by Zaxo (Archbishop) on May 26, 2004 at 18:56 UTC
`use HTML::Parser; use HTML::Entities;` [download] for the `encode_entities` function. After Compline, Zaxo	[reply] [d/l]
Re: HTML Entities RegEx by Roy Johnson (Monsignor) on May 26, 2004 at 18:57 UTC
HTML::Parser The PerlMonk `tr///` Advocate	[reply]
Re: HTML Entities RegEx by jryan (Vicar) on May 26, 2004 at 19:46 UTC
Off of the top of my head; untested: `use HTML::Entities; use HTML::TokeParser::Simple; my $p = HTML::TokeParser::Simple->new('somefile'); while ( my $token = $p->get_token ) { print $token->as_is; next unless $token->is_start_tag( 'textarea' ); while ($token) { $token = $p->get_token; last if $token->is_end_tag( 'textarea' ); print HTML::Entities::_entities( $token ); } print $token->as_is; }` [download] _{janitored by ybiC: Noted in title that this node isn't a duplicate of 356706. Author confirms that this is the node to keep. This node subsequently unconsidered, so node title tweak removed.}	[reply] [d/l]