in reply to Help using HTML::Parser

You're going to have to be a bit more clear in your specs. Let's say that we find the letters "fo" at the top of the document and about 200 K later, we find the letter "o". Do you want the substitution then? Kind of tough to tell. What about if you run into a word that contains the target letters, such as "fools"? Just wrapping "foo" in tags is easy (untested code follows). The following uses HTML::TokeParser::Simple instead of HTML::Parser.

use HTML::TokeParser::Simple; my $p = HTML::TokeParser::Simple->new( \$original_html ); my $new_html = ''; while ( my $token = $p->get_token ) { unless ($token->is_text) { $new_html .= $token->return_text; } else { my $text = $token->return_text; $text =~ s/foo/<bar>foo</bar>/g; $new_html .= $text; } }

Cheers,
Ovid

Join the Perlmonks Setiathome Group.
New address of my CGI Course.

Replies are listed 'Best First'.
Re: Re: Help using HTML::Parser
by Anonymous Monk on Nov 06, 2002 at 14:49 UTC
    Ovid, Thanks for your response. To answer your questions, 1)I would want the substitution if I had "fo<some 200K length Tag>o", but NOT if I had fo<some tag>q<some tag>o. 2)I would also want "fools" to become "<bar>foo</bar>" The only reason the code you supplied would not work for me is that it would not do the substitution on "fo<some 200K length Tag>o" I would prefer it, if I were able to do:
    while (my $token = $p->get_token ) { ..... if ($token->is_text) { $text .= $token->return_text; } } $text =~ s/foo/<bar>foo</bar>/g;
    That's where I get stuck because I see no way to "merge" my "new" document that contains only <bar> tags AND the original text with the original document, which contained the original text and all the other tags. Hope that clarifies.