in reply to HTML parsing using RegEx, HTML::Parser and or HTML::TokeParser?

Onthe basis that it is the first table on the page and isn't nested and doesn't contain nested tables, then a one liner will do it.

get the.url.com/path | perl -0777 -ne" print m[(<table.*?</table>)]si +" > file

Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
If I understand your problem, I can solve it! Of course, the same can be said for you.

  • Comment on Re: HTML parsing using RegEx, HTML::Parser and or HTML::TokeParser?
  • Download Code

Replies are listed 'Best First'.
Re: Re: HTML parsing using RegEx, HTML::Parser and or HTML::TokeParser?
by Starman (Initiate) on Aug 23, 2003 at 00:41 UTC
    Unfortunately it is not the first table however it is the only that uses these atributes "border=0 align=center". And there are no tables nested within it although it is nested within one itself.

      In that case, adding those attributes to the regex should do the trick. This is obviously untested.

      get the.url.com/path | perl -0777 -ne" print m[(<table border=0 align=center>.*?</table>)]si +" > file

      That is still a one-liner, but I split it across a few lines as the auto codewrap did horribly things to it.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
      If I understand your problem, I can solve it! Of course, the same can be said for you.

        That did the trick. Now if you'll indulge the following question, using the same example how would I change all the href and img src tags to add http://www.thesite.com. An example would be href="/my.gif" becoming become href="http://www.thesite.com/my.gif".