Good evening dear Monks!

Dear monks, I'm asking you a favour in troubleshooting a perl script! After having loooked at the HTML::TableExtract Examples for two hours my head is aching! see the examples here: http://www.mojotoad.com/sisk/projects/HTML-TableExtract/tables.html
i tried lots of own ideas - and now i come back to this place:


BTW: this is one of the best places in PERL-issues. A great place to learn!
I have worked with HTML::TokeParser and HTML::TreeBuilder:: to identify xpath-expression in the last days. I also read the documentation for HTML::TableExtract, And i also had some introductions in PERL::DBI

Now - at the moment i need to do some PERL-Job in order to get some Text that is stored in HTML-Tables .I guess that this is a great job for HTML::TableExtract. It can save my backside - since i have to parse more than 6000 files.

The HTML::TableExtract does what it says it does: Extracts specific tables from HTML source code. And it does that really well i want (need to do this with a site:see here.

i need to get the following 9 (or ten lines)

Schuldaten. Schulnummer: Amtliche Bezeichnung: Strasse: Plz und Ort: Telefon: Fax: E-Mail-Adresse: Schuldaten ändern] :(this is UTF8 encoded or what) Schülergesamtzahl (this is UTF8 encoded or what)


Question: can the HTML::TableExtract can be applied here to!? at the resultpage of more than 6400 shools: (See below) Love to hear from you

Perlbeginner1

BTW;

See this page: http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=672.8924536341191 Note: click all checkbuttons at the bottom of the site: Then you see a result-page with more than 6400 school-results: see at the right of the site Weitere Informationen anzeigen you can get detailed information if you click Weitere Informationen anzeigen

see here the code: where i have to extract the above mentioned text: Note: i only need to get the above mentioned 9 (or ten lines) ... out of these following lines: ( and out of 6400 further resultpage ;-) )


<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1 +"> <meta name="GENERATOR" content="Microsoft FrontPage 3.0"> <link rel="stylesheet" href="jspsrc/css/bp_style.css" type="text/css"> <title>Weitere Schulinformationen</title> </head> <body class="bodyclass"> <div style="text-align:center;"><center> <!-- <fieldset><legend> Allgemeine Informationen zur Schule </legend> --> <br/> <table border="1" cellspacing="0" bordercolordark="white" bordercolorl +ight="black" width="80%" class='bp_ergebnis_tab_info'> <!-- <table border="0" cellspacing="0" bordercolordark="white" borderc +olorlight="black" width="80%" class='bp_SchuleSuchenInfo'> --> <tr> <td width="100%" colspan="2" class="ldstabTitel"><strong>Schuldate +n</strong></td> </tr> <tr> <td width="27%"><strong>Schulnummer</strong></td> <td width="73%">&nbsp;120571 </td> </tr> <tr> <td width="27%"><strong>Amtliche Bezeichnung</strong></td> <td width="73%">&nbsp;Paul-Gerhardt-Schule Ev. Grundschule </td> </tr> <tr> <td width="27%"><strong>Strasse</strong></td> <td width="73%">&nbsp;Sonnenstr. 11 </td> </tr> <tr> <td width="27%"><strong>Plz und Ort</strong></td> <td width="73%">&nbsp;59269 Beckum </td> </tr> <tr> <td width="27%"><strong>Telefon</strong></td> <td width="73%">&nbsp;02521 950725 </td> </tr> <tr> <td width="27%"><strong>Fax</strong></td> <td width="73%">&nbsp; </td> </tr> <tr> <td width="27%"><strong>E-Mail-Adresse</strong></td> <td width="73%">&nbsp;<a href=mailto:120571@schule.nrw.de>120571@s +chule.nrw.de </a> </td> </tr> <tr> <td width="27%"><strong>Internet</strong></td> <td width="73%">&nbsp;<a href=http://www.paul-gerhardt-schule-beck +um.de>http://www.paul-gerhardt-schule-beckum.de </td> </tr> <!-- <tr> <td width="27%">&nbsp;</td> <td width="73%" align="right"><a href="schule_aeinfo.php?SNR=<? pr +int $SCHULNR ?>" target="_blank"> [Schuldaten &auml;ndern]&nbsp;&nbsp;</a> </tr> </td> --> <tr> <td width="27%">&nbsp;</td> <td width="73%">&nbsp;Schule in öffentlicher Trägerschaft</td> </tr> <tr> <td width="100%" colspan=2><strong>&nbsp;</strong></td> </tr> <tr> <td width="27%"><strong>Sch&uuml;lergesamtzahl</strong></td> <td width="73%">&nbsp;228 </td> <tr> <td width="100%" colspan=2><strong>&nbsp;</strong></td> </tr> <tr> <td width="27%"><strong>offene Ganztagsschule</strong></td> <td width="73%">&nbsp;Ja</td> </tr> <tr> <td width="27%"><strong>Schule von acht bis eins</strong></td> <td width="73%">&nbsp;Ja</td> </tr> <!-- if (!fsp.isEmpty()){ ztext = "&nbsp;"; int i = 0; Iterator it = fsp.iterator(); while (it.hasNext()){ String[] zwert = new String[2]; zwert = (String[])it.next(); if (i==0){ if (zwert[1].equals("0")){ ztext = ztext+zwert[0]; }else{ ztext = ztext+zwert[0]+" mit "+zwert[1]; if (zwert[1].equals("1")){ ztext = ztext+" Sch&uuml;ler"; }else{ ztext = ztext+" Sch&uuml;lern"; } } i++; }else{ if (zwert[1].equals("0")){ ztext = ztext+"<br>&nbsp;"+zwert[0]; }else{ ztext = ztext+"<br>&nbsp;"+zwert[0]+" mit "+zwert[1]; if (zwert[1].equals("1")){ ztext = ztext+" Sch&uuml;ler"; }else{ ztext = ztext+" Sch&uuml;lern"; } } } } --> </table> <!-- </fieldset> --> <br> </body> </html>


can this be done with the HTML::TableExtract

Dear Monks - i love to hear from you! ;-)

In reply to HTML::TableExtract: Parsing 9 lines of text in a table by Perlbeginner1

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.