I'd be tempted to solve this using a browser (createTextRange, findText, pasteHTML), but for perl,
HTML::TreeBuilder (a DOM approach) would be a good choice.
The basic approach is that you create a tree out of the html, and then scan it for text which you then try to match ... basicallly you'd implement
TextRanges in perl (without all the rendering related stuff of course).
update: I should note that HTML::Tree doesn't preserve the formatting of its input exactly, but thats not implicitly a bad thing.
To begin is as simple as
use strict;
use warnings;
use HTML::TreeBuilder;
my $body = HTML::TreeBuilder->new_from_content(
'h<b>e</b>l<i>lo</i>!!!'
)->find_by_tag_name('body');
if( $body->as_text =~ /hello!!!/ ){
print $_,$/ for $body->content_list;
}
__END__
h
HTML::Element=HASH(0x1a540e0)
l
HTML::Element=HASH(0x1a54140)
!!!
Hopefully that'll help you see the forest :)
| MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!" |
| I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README). |
| ** The third rule of perl club is a statement of fact: pod is sexy. |
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.