in reply to Remove html tags to obtain plain text
I hope you realise that your HTML is pretty seriously broken. If that's carelessness on your part, it's not a good sign. If it's typical of what you are likely to get in real life, I understand.
I found it easier to write my own parser than to work through the HTML::Parser docs & I don't remember them defining how it works with broken HTML like yours. My parser, XML::Lenient, is specifically intended to cope. Two ways of extracting your text are shown in the code below:
use Modern::Perl; use XML::Lenient; my $p = XML::Lenient->new(); my $string = '<style>table{border-collapse: collapse;margin-left: 1cm; +font-Family: courier;width: 60%}.hoverTable tr{background: #D8D8D8;} +.hoverTable tr:hover{background-color: #ffff99; }</style><table borde +r=2 class="hoverTable">[20160628_151916] <tr><td bgcolor="#366092"><f +ont color="White"> PLAIN TEXT TO BE EXTRACTED</td>'; say $p->innertext($p->within($string, 'td')); say $p->wpath($string, 'td/font');
As you don't tell us why you want to use HTML::Parser, I have no idea whether my module would be better for you, though. And remember that I'm biased, like any parent.
Regards,
John Davies
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Remove html tags to obtain plain text
by Mj1234 (Sexton) on Jun 30, 2016 at 05:19 UTC | |
by Anonymous Monk on Jun 30, 2016 at 06:41 UTC |