Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I need a regex that will replace ampersands that mark the beginning of an HTML entity with &. The catch is that I only want to do so if the HTML entities fall within a TEXTAREA of a FORM. Here is my regex for the basic replacement:

s/&([a-zA-Z]{2,6};|#[0-9]{3};)/&$1/gi;

How can I expand this so that the replacement only happens inside TEXTAREAs?

Replies are listed 'Best First'.
Re: HTML Entities RegEx
by dragonchild (Archbishop) on May 26, 2004 at 18:56 UTC
    Use HTML::Parser to find the stuff that falls inside a textarea. Then, apply your regex. (Better is to use the various encode functions that come with CGI and friends).

    ------
    We are the carpenters and bricklayers of the Information Age.

    Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

    I shouldn't have to say this, but any code, unless otherwise stated, is untested

Re: HTML Entities RegEx
by Zaxo (Archbishop) on May 26, 2004 at 18:56 UTC

    use HTML::Parser; use HTML::Entities;
    for the encode_entities function.

    After Compline,
    Zaxo

Re: HTML Entities RegEx
by Roy Johnson (Monsignor) on May 26, 2004 at 18:57 UTC
Re: HTML Entities RegEx
by jryan (Vicar) on May 26, 2004 at 19:46 UTC

    Off of the top of my head; untested:

    use HTML::Entities; use HTML::TokeParser::Simple; my $p = HTML::TokeParser::Simple->new('somefile'); while ( my $token = $p->get_token ) { print $token->as_is; next unless $token->is_start_tag( 'textarea' ); while ($token) { $token = $p->get_token; last if $token->is_end_tag( 'textarea' ); print HTML::Entities::_entities( $token ); } print $token->as_is; }

    janitored by ybiC: Noted in title that this node isn't a duplicate of 356706.   Author confirms that *this* is the node to keep. This node subsequently unconsidered, so node title tweak removed.