in reply to question about lookaheads and threatexpert/html parsing

See the link Markup in the Monastery given at the bottom of node editing page you used to enter your question text and you will see that <code>...</code> tags can be used to wrap code and other stuff you don't want interpreted as HTML text.

How about showing us 10 or so "lines" of real data and the code you have tried to far to solve the problem. See I know what I mean. Why don't you? for tips about how you should present that.

My first take is that you should be using something like HTML::TreeBuilder to wrangle the input data.

Premature optimization is the root of all job security

Replies are listed 'Best First'.
Re^2: question about lookaheads and threatexpert/html parsing
by Anonymous Monk on Mar 23, 2016 at 22:05 UTC
    Actually, that _was_ the real data....
    <ul><li>The following Host Name was requested from a host database:</l +i> <ul> <li>192.5.5.241</li> </ul></ul>
    Everything between the first <ul> after the host down to the first ul pairing at </ul></ul>. There is an unknown number of li line elements between these two ul statements. The precursor to that data chunk is the bit where it comments about "Host Name". view-source:http://www.threatexpert.com/report.aspx?md5=ab41b1e2db77cebd9e2779110ee3915d The above is a sample of the raw html file to be parsed.