Re: Munging Rendered HTML While Preserving Formatting

I'd be tempted to solve this using a browser (createTextRange, findText, pasteHTML), but for perl, HTML::TreeBuilder (a DOM approach) would be a good choice. The basic approach is that you create a tree out of the html, and then scan it for text which you then try to match ... basicallly you'd implement TextRanges in perl (without all the rendering related stuff of course).

update: I should note that HTML::Tree doesn't preserve the formatting of its input exactly, but thats not implicitly a bad thing. To begin is as simple as

use strict;
use warnings;
use HTML::TreeBuilder;

my $body = HTML::TreeBuilder->new_from_content(
    'h<b>e</b>l<i>lo</i>!!!'
)->find_by_tag_name('body');

if( $body->as_text   =~ /hello!!!/ ){
    print $_,$/ for $body->content_list;
}

__END__
h
HTML::Element=HASH(0x1a540e0)
l
HTML::Element=HASH(0x1a54140)
!!!
[download]

Hopefully that'll help you see the forest :)

MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
** The third rule of perl club is a statement of fact: pod is sexy.

Comment on Re: Munging Rendered HTML While Preserving Formatting Download Code

MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
** The third rule of perl club is a statement of fact: pod is sexy.