Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

HTML::TableExtract: Parsing 9 lines of text in a table

by Perlbeginner1 (Scribe)
on Oct 17, 2010 at 23:49 UTC ( [id://865854]=perlquestion: print w/replies, xml ) Need Help??

Perlbeginner1 has asked for the wisdom of the Perl Monks concerning the following question:

Good evening dear Monks!

Dear monks, I'm asking you a favour in troubleshooting a perl script! After having loooked at the HTML::TableExtract Examples for two hours my head is aching! see the examples here: http://www.mojotoad.com/sisk/projects/HTML-TableExtract/tables.html
i tried lots of own ideas - and now i come back to this place:


BTW: this is one of the best places in PERL-issues. A great place to learn!
I have worked with HTML::TokeParser and HTML::TreeBuilder:: to identify xpath-expression in the last days. I also read the documentation for HTML::TableExtract, And i also had some introductions in PERL::DBI

Now - at the moment i need to do some PERL-Job in order to get some Text that is stored in HTML-Tables .I guess that this is a great job for HTML::TableExtract. It can save my backside - since i have to parse more than 6000 files.

The HTML::TableExtract does what it says it does: Extracts specific tables from HTML source code. And it does that really well i want (need to do this with a site:see here.

i need to get the following 9 (or ten lines)

Schuldaten. Schulnummer: Amtliche Bezeichnung: Strasse: Plz und Ort: Telefon: Fax: E-Mail-Adresse: Schuldaten ändern] :(this is UTF8 encoded or what) Schülergesamtzahl (this is UTF8 encoded or what)


Question: can the HTML::TableExtract can be applied here to!? at the resultpage of more than 6400 shools: (See below) Love to hear from you

Perlbeginner1

BTW;

See this page: http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=672.8924536341191 Note: click all checkbuttons at the bottom of the site: Then you see a result-page with more than 6400 school-results: see at the right of the site Weitere Informationen anzeigen you can get detailed information if you click Weitere Informationen anzeigen

see here the code: where i have to extract the above mentioned text: Note: i only need to get the above mentioned 9 (or ten lines) ... out of these following lines: ( and out of 6400 further resultpage ;-) )


<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1 +"> <meta name="GENERATOR" content="Microsoft FrontPage 3.0"> <link rel="stylesheet" href="jspsrc/css/bp_style.css" type="text/css"> <title>Weitere Schulinformationen</title> </head> <body class="bodyclass"> <div style="text-align:center;"><center> <!-- <fieldset><legend> Allgemeine Informationen zur Schule </legend> --> <br/> <table border="1" cellspacing="0" bordercolordark="white" bordercolorl +ight="black" width="80%" class='bp_ergebnis_tab_info'> <!-- <table border="0" cellspacing="0" bordercolordark="white" borderc +olorlight="black" width="80%" class='bp_SchuleSuchenInfo'> --> <tr> <td width="100%" colspan="2" class="ldstabTitel"><strong>Schuldate +n</strong></td> </tr> <tr> <td width="27%"><strong>Schulnummer</strong></td> <td width="73%">&nbsp;120571 </td> </tr> <tr> <td width="27%"><strong>Amtliche Bezeichnung</strong></td> <td width="73%">&nbsp;Paul-Gerhardt-Schule Ev. Grundschule </td> </tr> <tr> <td width="27%"><strong>Strasse</strong></td> <td width="73%">&nbsp;Sonnenstr. 11 </td> </tr> <tr> <td width="27%"><strong>Plz und Ort</strong></td> <td width="73%">&nbsp;59269 Beckum </td> </tr> <tr> <td width="27%"><strong>Telefon</strong></td> <td width="73%">&nbsp;02521 950725 </td> </tr> <tr> <td width="27%"><strong>Fax</strong></td> <td width="73%">&nbsp; </td> </tr> <tr> <td width="27%"><strong>E-Mail-Adresse</strong></td> <td width="73%">&nbsp;<a href=mailto:120571@schule.nrw.de>120571@s +chule.nrw.de </a> </td> </tr> <tr> <td width="27%"><strong>Internet</strong></td> <td width="73%">&nbsp;<a href=http://www.paul-gerhardt-schule-beck +um.de>http://www.paul-gerhardt-schule-beckum.de </td> </tr> <!-- <tr> <td width="27%">&nbsp;</td> <td width="73%" align="right"><a href="schule_aeinfo.php?SNR=<? pr +int $SCHULNR ?>" target="_blank"> [Schuldaten &auml;ndern]&nbsp;&nbsp;</a> </tr> </td> --> <tr> <td width="27%">&nbsp;</td> <td width="73%">&nbsp;Schule in öffentlicher Trägerschaft</td> </tr> <tr> <td width="100%" colspan=2><strong>&nbsp;</strong></td> </tr> <tr> <td width="27%"><strong>Sch&uuml;lergesamtzahl</strong></td> <td width="73%">&nbsp;228 </td> <tr> <td width="100%" colspan=2><strong>&nbsp;</strong></td> </tr> <tr> <td width="27%"><strong>offene Ganztagsschule</strong></td> <td width="73%">&nbsp;Ja</td> </tr> <tr> <td width="27%"><strong>Schule von acht bis eins</strong></td> <td width="73%">&nbsp;Ja</td> </tr> <!-- if (!fsp.isEmpty()){ ztext = "&nbsp;"; int i = 0; Iterator it = fsp.iterator(); while (it.hasNext()){ String[] zwert = new String[2]; zwert = (String[])it.next(); if (i==0){ if (zwert[1].equals("0")){ ztext = ztext+zwert[0]; }else{ ztext = ztext+zwert[0]+" mit "+zwert[1]; if (zwert[1].equals("1")){ ztext = ztext+" Sch&uuml;ler"; }else{ ztext = ztext+" Sch&uuml;lern"; } } i++; }else{ if (zwert[1].equals("0")){ ztext = ztext+"<br>&nbsp;"+zwert[0]; }else{ ztext = ztext+"<br>&nbsp;"+zwert[0]+" mit "+zwert[1]; if (zwert[1].equals("1")){ ztext = ztext+" Sch&uuml;ler"; }else{ ztext = ztext+" Sch&uuml;lern"; } } } } --> </table> <!-- </fieldset> --> <br> </body> </html>


can this be done with the HTML::TableExtract

Dear Monks - i love to hear from you! ;-)

Replies are listed 'Best First'.
Re: HTML::TableExtract: Parsing 9 lines of text in a table
by tinita (Parson) on Oct 18, 2010 at 09:15 UTC
    you posted some of the snippets you got here to http://www.perl-community.de/.
    you're still trying to say that you don't want a coding service, but if you're getting a snippet or two here and let them put together in the other forum and then again post a snippet from there here, this is effectively a coding service. you're not even linking to the other threads, and now after you posted so many times it begins to look rude.
    I have the impression that you don't understand all of the code and just hope that in every new thread somebody will add a working snippet until you got it all together. then it would be much more effective to just ask for a programmer and offer some money.
Re: HTML::TableExtract: Parsing 9 lines of text in a table
by Anonymous Monk on Oct 17, 2010 at 23:52 UTC
    Dear Monks - i love to hear from you! ;-)

    You've gotten quite a few fish and fishing lessons, its time to try fishing :)

      hi -

      many thanks for the reply. I try to do so.

        You only ask a very general question:

        Question: can the HTML::TableExtract can be applied here to!?

        And you already got the answer, multiple times, even with very concise code. Why do you think that you would get a different answer this time? What has changed in the world that would make yesterdays answer different from the one you get today?

        Maybe if you started to write your own code and asked about the problems you encounter with your own code, you would get more concrete answers to your more concrete problems. This is not a script writing service. You are supposed to do your own work.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://865854]
Approved by aquarium
Front-paged by aquarium
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2024-03-28 21:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found