2ge has asked for the wisdom of the Perl Monks concerning the following question:

Hello all,

I use this superb module for parsing html files, it is really great. But I run out of idea how to parse quite simple part in complex html file. Ok, it looks like this:
#...here are some divs, ids and so on... <h4>START</h4> #...here are some informations <hr /> #...here are some additional data
Now I don't know how to specify in HTML::TreeBuilder to get everything from <h4>START</h4> and ending <hr />
I can use simple regexp for that, but I need HTML::TB object, so how I can do it ?
I thought about getting that part into variable via regexp, and after that $tree2->parse($temp);

It is ok via this way?

Replies are listed 'Best First'.
Re: HTML::TreeBuilder question
by ww (Archbishop) on Apr 27, 2005 at 16:22 UTC
    If I understand your question and data correctly, you may make that work well
    ... so long as you're very sure there won't be <hr /> inside something you wish to capture
    ... and, given that precondition, that you're very careful about the greediness (or non-greediness, in this case) of your regex.
      WW,

      thanks for a reply, I will try that tomorrow. But I think (and hope!) there will be no problem at all. My regexp is simple: /<h4>(.*?)<hr/gm;