Specific Text Extracting

dcb0127 has asked for the wisdom of the Perl Monks concerning the following question:

I'd like to extract text that happens to be in tables I believe at a depth of 4. However I would only like to extract the text that has the html tag <font color="#ff0000"><strong> in that cell. Example:

<td><font size="2" face="Arial, Helvetica, sans-serif">Run John, Run <
+font color="#ff0000"><strong>He Ran</strong></font></font></td>
[download]

I'd like that entire link just the text. Can someone lead me into the right direction? I'm familiar with HTML::TableExtract but I'm still a beginner

Comment on Specific Text Extracting Select or Download Code

Replies are listed 'Best First'.
Re: Specific Text Extracting by spartacus9 (Beadle) on Sep 26, 2002 at 21:14 UTC
What you're wanting to do is probably possible using the HTML::TreeBuilder module in conjunction with the HTML::TableExtract module you referred to.	[reply]
Re: Specific Text Extracting by mojotoad (Monsignor) on Nov 07, 2002 at 17:19 UTC
Since version 1.06 (current 1.08), HTML::TableExtract has a 'keep_html' flag that will retain the raw HTML within each cell. So you'll want to target tables at your depth while retaining HTML, then run the chunks of HTML through another parser or regexp to yank the text with your desired attributes. Matt	[reply]