hello Marto, hello all,
Many thanks for the help. I am glad to hear from you. Note - i will learn to use the markup here:
the
itterator variable can be done like this:
for my $i (0..10000) {
my $url = 'http://dms-schule.bildung.hessen.de/suchen/suche_schul_
+db.html?show_school='.$i;
print "$url\n";
}
But Marto - i am currently workin on a tookit- You are right - i want to fetch and parse some sites; It is a little project for my school:
Which parts exactly do
i having problems with? well i am not very familiar with Perl - but i have seen that the tasks can be done with Perl.
The parts of:
1. fetching
2. parsing
3. storing in db can be done with perl! Sure thing
What is aimed; i want to collect some data that is derived on several governmental servers -
Datas about schools in Germany: see here http://www.bildungsserver.de/zeigen.html?seite=276
now i need a kind of a toolset to do the job:
1. The Fetching-part with LWP::UserAgent
2. The Parser-Part with HTML::TreeBuilder::XPath or HTML::TokeParser
3. the db Part with Perl DBI or something alike
if we see some examples then it is obvious that the 1 part -the fetchin part is allmost the same - in
several cases:
see this one - an example of the above mentioned collection
at the bildungsserver ;
Hessen:
http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html?show_school=5503
http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html?show_school=5504
http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html?show_school=5505
http://dms-schule.bildung.hessen.de/suchen/suche_schul_db.html?show_school=5514
etc
btw. for the parser-part we can use HTML::TreeBuilder::XPath -> see the code of the page:
Marker Class "Floatbox"
even more exciting - the next both examples are truely examples that are 100 Perl-Jobs:
first part is to fetch the pages with Mechanize or LWP::UserAgent - then we can parse it with
HTML::TreeBuilder::XPath
see also this next example:
Niedersachen:
http://schulnetz.nibis.de/db/schulen/schule.php?schulnr=67003&lschb=
http://schulnetz.nibis.de/db/schulen/schule.php?schulnr=67002&lschb=
http://schulnetz.nibis.de/db/schulen/schule.php?schulnr=67001&lschb=
http://schulnetz.nibis.de/db/schulen/schule.php?schulnr=67004&lschb=
we can parse it with HTML::TreeBuilder::XPath - see the sourcecode:the Marker is <class="fliess">
btw: the Background: http://nibis.ni.schule.de/nibis.phtml?menid=590 Search with wildcard: %
the third example:
Nordrhein-Westfalen
http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=445.69028196477257&SchulAdresseMapDO=116191
http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=445.69028196477257&SchulAdresseMapDO=116348
Marker: class='bp_ergebnis_tab_info'> again we can parse it with HTML::TreeBuilder::XPath
Above all: i am not very familiar with Perl - but i have seen that the taks can be done with Perl.
The parts of:
i need some help with the fetchin part - the lwp and with the combining of this first part with the second part - the parser part.
it is kind a work to get a good
toolset or toolkit! isnīt it!
And i am sure - it can be done with
Perl very effective!
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.