Re: Remove html tags to obtain plain text

I hope you realise that your HTML is pretty seriously broken. If that's carelessness on your part, it's not a good sign. If it's typical of what you are likely to get in real life, I understand.

I found it easier to write my own parser than to work through the HTML::Parser docs & I don't remember them defining how it works with broken HTML like yours. My parser, XML::Lenient, is specifically intended to cope. Two ways of extracting your text are shown in the code below:

use Modern::Perl;
use XML::Lenient;

my $p = XML::Lenient->new();
my $string = '<style>table{border-collapse: collapse;margin-left: 1cm;
+font-Family: courier;width: 60%}.hoverTable tr{background: #D8D8D8;} 
+.hoverTable tr:hover{background-color: #ffff99; }</style><table borde
+r=2 class="hoverTable">[20160628_151916] <tr><td bgcolor="#366092"><f
+ont color="White"> PLAIN TEXT TO BE EXTRACTED</td>';
say $p->innertext($p->within($string, 'td'));
say $p->wpath($string, 'td/font');
[download]

As you don't tell us why you want to use HTML::Parser, I have no idea whether my module would be better for you, though. And remember that I'm biased, like any parent.

Regards,

John Davies

Comment on Re: Remove html tags to obtain plain text Download Code

Replies are listed 'Best First'.
Re^2: Remove html tags to obtain plain text by Mj1234 (Sexton) on Jun 30, 2016 at 05:19 UTC
This is just part of the complete HTML text. I am trying to use HTML::Parser as I am unable to use HTML::Strip.	[reply]
Re^3: Remove html tags to obtain plain text by Anonymous Monk on Jun 30, 2016 at 06:41 UTC
I am trying to use HTML::Parser as I am unable to use HTML::Strip. Then use one of the other "html strip" modules	[reply]