Re: Help using HTML::Parser

You're going to have to be a bit more clear in your specs. Let's say that we find the letters "fo" at the top of the document and about 200 K later, we find the letter "o". Do you want the substitution then? Kind of tough to tell. What about if you run into a word that contains the target letters, such as "fools"? Just wrapping "foo" in tags is easy (untested code follows). The following uses HTML::TokeParser::Simple instead of HTML::Parser.

use HTML::TokeParser::Simple;
my $p = HTML::TokeParser::Simple->new( \$original_html );

my $new_html = '';
while ( my $token = $p->get_token ) {
    unless ($token->is_text) {
        $new_html .= $token->return_text;
    }
    else {
        my $text = $token->return_text;
        $text =~ s/foo/<bar>foo</bar>/g;
        $new_html .= $text;
    }
}
[download]

Cheers,
Ovid

Join the Perlmonks Setiathome Group.
New address of my CGI Course.

Comment on Re: Help using HTML::Parser Download Code

Replies are listed 'Best First'.
Re: Re: Help using HTML::Parser by Anonymous Monk on Nov 06, 2002 at 14:49 UTC
Ovid, Thanks for your response. To answer your questions, 1)I would want the substitution if I had "fo<some 200K length Tag>o", but NOT if I had fo<some tag>q<some tag>o. 2)I would also want "fools" to become "<bar>foo</bar>" The only reason the code you supplied would not work for me is that it would not do the substitution on "fo<some 200K length Tag>o" I would prefer it, if I were able to do: `while (my $token = $p->get_token ) { ..... if ($token->is_text) { $text .= $token->return_text; } } $text =~ s/foo/<bar>foo</bar>/g;` [download] That's where I get stuck because I see no way to "merge" my "new" document that contains only <bar> tags AND the original text with the original document, which contained the original text and all the other tags. Hope that clarifies.	[reply] [d/l]

Replies are listed 'Best First'.

Re: Re: Help using HTML::Parser
by Anonymous Monk on Nov 06, 2002 at 14:49 UTC

while (my $token = $p->get_token ) {
  .....
  if ($token->is_text) {
        $text .= $token->return_text;
  }
}
$text =~ s/foo/<bar>foo</bar>/g;
[download]

[reply]
[d/l]