looks like I'm on the right path, they don't have a way to manipulate plain text in an HTML file -- while still preserving the HTML structure...
I don't know what you've been doing, but you most certainly can. There is an example at (crazyinsomniac) Re: Is this the best way to use HTML::TreeBuilder to bold text in an HTML document?.

Also, a regex is not completely out of the question, something like *code goes here, working on it*

use strict; use warnings; my $name = 'PodMaster'; my $url = 'http://perlmonks.org/?node=PodMaster'; my $html = q~ <html> <title> PodMaster </title> <style> PodMaster { } </style> <body> <h1>PodMaster </h1> Hi there PodMaster blah blah blah <b>Pod</b><i>Master</i> </body> </html> ~; print $/, untag_MOD( $html, $name, $url ), $/; #http://perlmonks.org/?node_id=161281 modified for our purposes sub untag_MOD { local $_ = $_[0] || $_; # ALGORITHM: # find < , # comment <!-- ... -->, # or comment <? ... ?> , # or one of the start tags which require correspond # end tag plus all to end tag # or if \s or =" # then skip to next " # else [^>] # > # 1 is the entire "tag", add +1 to all numbers in comments s{ ( # podmaster < # open tag (?: # open group (A) (!--) | # comment (1) or (\?) | # another comment (2) or (?i: # open group (B) for /i ( TITLE | # one of start tags SCRIPT | # for which APPLET | # must be skipped OBJECT | # all content STYLE # to correspond ) # end tag (3) ) | # close group (B), or ([!/A-Za-z]) # one of these chars, remember in (4) ) # close group (A) (?(5) # if previous case is (4) (?: # open group (C) (?! # and next is not : (D) [\s=] # \s or "=" ["`'] # with open quotes ) # close (D) [^>] | # and not close tag or [\s=] # \s or "=" with `[^`]*` | # something in quotes ` or [\s=] # \s or "=" with '[^']*' | # something in quotes ' or [\s=] # \s or "=" with "[^"]*" # something in quotes " )* # repeat (C) 0 or more times | # else (if previous case is not (4)) .*? # minimum of any chars ) # end if previous char is (4) (?(2) # if comment (1) (?<=--) # wait for "--" ) # end if comment (1) (?(3) # if another comment (2) (?<=\?) # wait for "?" ) # end if another comment (2) (?(4) # if one of tags-containers (3) </ # wait for end (?i:\4) # of this tag (?:\s[^>]*)? # skip junk to ">" ) # end if (3) > # tag closed ) ([^<]*) # 6, text } ' my $ret = $1; if( $6 ){ my $text = $6; $text =~ s~\b(\Q$_[1]\E)\b~<a href="$_[2]">$1</a>~g; # add + link $ret .= $text; } $ret; 'gsxe; return $_ ? $_ : ""; } __END__
Note the caveats in strip HTML tags. Another potential (i wouldn't consider it one) caveat is that both of these don't translate <b>Pod</b><i>Master</i> into a link. If you want to do that you should use HTML::TreeBuilder.

MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
** The third rule of perl club is a statement of fact: pod is sexy.


In reply to Re: Search and replacing across 500,000 HTML documents by PodMaster
in thread Search and replacing across 500,000 HTML documents by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.