The data you want is contained in a table which follows the (bold-faced boiler-plate) phrase "Allgemeine Daten der Schule / Behörde:". Here's the html:
</p><h1>Allgemeine Daten der Schule / Behörde:</h1> <table border="0" bgcolor="#EFEFEF" leftmargin="15" topmargin="5"><tr> <td><strong>Schul-/Behördenname:</strong> </td>...The above begins at the start of line 989 according to w3c's html validator (which also lists 37 errors, 40-some warnings and an obsolete html doctype which w3c no longer validates).
The section you want apparently ends with the </table> matching the open above. That's still inside line 989 according to w3c.
In other words, the general problem is not rooted in line numbers; IMO, you can find (by regex or other tool) the opening table in your desired data and mung the data from there. The approach offered by BrowserUk looks to me like the way to go.
The table, at least, appears reasonably well-formed if inconsistently formatted. In any case, the last row in which you appear to be interested (ie, just before the </table> mentioned above) is:
<tr> <td><strong>Schulträger:</strong> </td> <td> <Verband/Verein> (Verband/Verein) </td></tr>(for emphasis) ... still inside line 989.
There, by gosh, that dead horse has been beaten enough!
One takeaway might be "Look for patterns in your data." When they exist, they may help you solve your problem.
In reply to Re: HTML-Parser: a newbie question: need to extract exactly line 999 out of 5000 files..
by ww
in thread HTML-Parser: a newbie question: need to extract exactly line 999 out of 5000 files..
by Perlbeginner1
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |