comment on

The data you want is contained in a table which follows the (bold-faced boiler-plate) phrase "Allgemeine Daten der Schule / Behörde:". Here's the html:

</p><h1>Allgemeine Daten der Schule / Behörde:</h1> <table border="0" bgcolor="#EFEFEF" leftmargin="15" topmargin="5"><tr> <td><strong>Schul-/Behördenname:</strong> </td>...

The above begins at the start of line 989 according to w3c's html validator (which also lists 37 errors, 40-some warnings and an obsolete html doctype which w3c no longer validates).

The section you want apparently ends with the </table> matching the open above. That's still inside line 989 according to w3c.

In other words, the general problem is not rooted in line numbers; IMO, you can find (by regex or other tool) the opening table in your desired data and mung the data from there. The approach offered by BrowserUk looks to me like the way to go.

The table, at least, appears reasonably well-formed if inconsistently formatted. In any case, the last row in which you appear to be interested (ie, just before the </table> mentioned above) is:

<tr> <td><strong>Schulträger:</strong> </td> <td> <Verband/Verein> (Verband/Verein) </td></tr>

(for emphasis) ... still inside line 989.

There, by gosh, that dead horse has been beaten enough!

One takeaway might be "Look for patterns in your data." When they exist, they may help you solve your problem.

In reply to Re: HTML-Parser: a newbie question: need to extract exactly line 999 out of 5000 files.. by ww
in thread HTML-Parser: a newbie question: need to extract exactly line 999 out of 5000 files.. by Perlbeginner1

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.