Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

How could I find this tags
<small>text any text</small>
in a source code of a html page and substitute anything that is inside those tag to nothing?

Replies are listed 'Best First'.
•Re: Regular Expression
by merlyn (Sage) on Dec 31, 2002 at 17:13 UTC
    use HTML::TokeParser::Simple; my $p = HTML::TokeParser::Simple->new (\*DATA) or die; my $in_small = 0; while (my $token = $p->get_token) { if ($token->is_start_tag('small')) { $in_small++; } elsif ($token->is_end_tag('small')) { $in_small--; } elsif (not $in_small) { print $token->as_is; } } __END__ <html><head><title>hello world</title></head><body> <h1>hello world</h1> Did you know that <small>this is small</small>? </body></html>

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

Re: Regular Expression
by Wonko the sane (Curate) on Dec 31, 2002 at 17:12 UTC
    $html_text =~ s@<small>.*?</small>@<small></small>@g;

    Wonko