Re: HTML parsing using RegEx, HTML::Parser and or HTML::TokeParser?

Onthe basis that it is the first table on the page and isn't nested and doesn't contain nested tables, then a one liner will do it.

get the.url.com/path | perl -0777 -ne" print  m[(<table.*?</table>)]si
+" > file
[download]

Examine what is said, not who speaks.

"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
If I understand your problem, I can solve it! Of course, the same can be said for you.

Comment on Re: HTML parsing using RegEx, HTML::Parser and or HTML::TokeParser? Download Code

Replies are listed 'Best First'.
Re: Re: HTML parsing using RegEx, HTML::Parser and or HTML::TokeParser? by Starman (Initiate) on Aug 23, 2003 at 00:41 UTC
Unfortunately it is not the first table however it is the only that uses these atributes "border=0 align=center". And there are no tables nested within it although it is nested within one itself.	[reply]
Re: Re: Re: HTML parsing using RegEx, HTML::Parser and or HTML::TokeParser? by BrowserUk (Patriarch) on Aug 23, 2003 at 01:19 UTC
In that case, adding those attributes to the regex should do the trick. This is obviously untested. `get the.url.com/path \| perl -0777 -ne" print m[(<table border=0 align=center>.*?</table>)]si +" > file` [download] That is still a one-liner, but I split it across a few lines as the auto codewrap did horribly things to it. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller If I understand your problem, I can solve it! Of course, the same can be said for you.	[reply] [d/l]
Re: Re: Re: Re: HTML parsing using RegEx, HTML::Parser and or HTML::TokeParser? by Starman (Initiate) on Aug 24, 2003 at 18:58 UTC
That did the trick. Now if you'll indulge the following question, using the same example how would I change all the href and img src tags to add http://www.thesite.com. An example would be href="/my.gif" becoming become href="http://www.thesite.com/my.gif".	[reply]
Re: Re: Re: Re: Re: HTML parsing using RegEx, HTML::Parser and or HTML::TokeParser? by BrowserUk (Patriarch) on Aug 24, 2003 at 19:51 UTC