vroom has asked for the wisdom of the Perl Monks concerning the following question:

Given a string which contains an HTML document I want to do substitutions on only the text which is outside of HTML tags. Any suggestions as to how to best do this or any modules to look at?

Replies are listed 'Best First'.
Re: Substitution outside of HTML TAGS
by takshaka (Friar) on May 18, 2000 at 21:00 UTC
    The docs do mention that HTML::Filter is deprecated in favor of HTML::Parser.

    The new v3 API is nifty.

    #!/usr/bin/perl -w use strict; use HTML::Parser; my $html; { local $/; $html = <DATA>; } my @parsed; my $p = HTML::Parser->new(api_version => 3, handlers => {default => [\@parsed, "event +,text"]} ); $p->parse($html); for (@parsed) { $_->[1] =~ s/poor/Perl/g if $_->[0] eq 'text'; print $_->[1]; } __DATA__ <HTML><HEAD><TITLE>poor hacker's almanac</TITLE></HEAD> <BODY BACKGROUND="poor.jpg"> <!-- poor comment --> <I>Just</I> Another <B>poor</B> Hacker. <poor tag> </BODY> </HTML>
RE: Substitution outside of HTML TAGS
by merlyn (Sage) on May 18, 2000 at 20:18 UTC
Re: Substitution outside of HTML TAGS
by zaphod.nu (Scribe) on May 18, 2000 at 22:43 UTC
    How about a regexp?
    I usally do this by making something like this:
    $line =~ s/<<VARIABLE>>/$variable/;

    with $line being a line from the HTML file I'm currently parsing...

    In your case it should be something like this:
    if ($line ne /\<.+\>/) { $line =~ s/<<VARIABLE>>/$variable/; }
    .sig
    I'm not defect!
Re: Substitution outside of HTML TAGS
by BigJoe (Curate) on May 18, 2000 at 20:14 UTC
    Do you mean between:
    <HTML> inside </HTML> outside or outside <inside>
    ?
      Some clarification:
      outside<INSIDE>outside</INSIDE>outside