in reply to Advanced regular expression help
Here's my go with a parser
If you have more exacting requirements or if (when) the spec changes this approach is, imo, easier to adapt than a regex approach.#!/usr/local/bin/perl use strict; use warnings; use HTML::TokeParser::Simple; my $text1 = qq{ <div id="aaaa"> text tex text <ul id="ccc">bla bla bla</ul> more text </div> }; my $text2 = qq{ <div id="aaaa"> text text text more text </div> }; my $txt; $txt = retrieve($text1); print $txt; print q{-} x 20; $txt = retrieve($text2); print $txt; sub retrieve{ my $html = shift; my $p = HTML::TokeParser::Simple->new(\$html) or die qq{cant parse text: $!\n}; my $txt; while (my $t = $p->get_token){ $txt .= $t->as_is if $t->is_text; } return $txt; }
update: added output
text tex text bla bla bla more text -------------------- text text text more text
|
|---|