Perlbeginner1 has asked for the wisdom of the Perl Monks concerning the following question:

Hello dear Community, good evening!

My first perl-parser runs very well - (thanks to the support of some friends here : Many many thanks to those who helped alot!). I was given lots of good hints and now i am ready and running a first parser with some great output!

BTW: Well i am very very glad to be here - this is a great place and i have a nice learning curve...This forums give me a great asset of learning!!

What is aimed now: i am parsing site with a small table

see here: http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=799.601437941842&SchulAdresseMapDO=142359

Note: This page - and many others that are simmilar- is/are stored locally in the same folder as file t.html

Now i want to store the data in a MySQL-DB: i am figuring out how to do this:

Here the code:

#!/usr/bin/perl use strict; use warnings; use HTML::TableExtract; use YAML; my $te = HTML::TableExtract->new( attribs => { class => 'bp_ergebnis_tab_info' }, ); $te->parse_file('t.html'); foreach my $table ( $te->tables ) { foreach my $row ($table->rows) { my @values = grep {defined} @$row; print " ", join(',', @values), "\n"; } }


Well, now i want to store the output in a MySQL-DB. I currently work on the code that is needed to store the data in a MySQL-DB. I need a starting point for the creation of this db. Well see the labels and the values:

Schulnummer, 143960 Amtliche Bezeichnung,Franziskusschule Kath. Hauptschule Ahaus - Sekund +arstufe I Strasse, Hof zum Ahaus 6 Plz und Ort, 48683 Ahaus Telefon, 02561 4291990 Fax, 02561 42919920 E-Mail-Adresse, 3960@schule.nrw.de


Well, now i try to figure out the db-connection.
I look forward to some ideas - any and all ideas & hints for the starting point will be greatly appreciated.

Greetings pb1

Replies are listed 'Best First'.
Re: storing some values (of an array) to a DB: how does Perl talk to MySQL?
by Anonymous Monk on Dec 20, 2010 at 16:33 UTC
Re: storing some values (of an array) to a DB: how does Perl talk to MySQL?
by ww (Archbishop) on Dec 20, 2010 at 19:46 UTC

    - -

    Your "friends here" have walked you through similar questions -- in great detail -- at least a half dozen times, recently.

    They're often provided solutions and at least as often, pointed out references, tutorials, and other learning tools which you can use on your own. Anonymonk has provided links to relevant docs, above.

    Now, it's time to "figure out" the db connection (and other problems) -- on your own -- by using the tools you've been given, rather than running to the Monastery for each tiny step forward. Come back, when and if, you've done your own work and have some particular issue in your coding.

      unfortunately there is only a ++ button, but I need a ++++++++ button ;o)

      hello ww & hello Anonymous

      first of all: @anon.Monk: many thanks for the hints. You gave me great hints and interesting starting points.


      @ww: Thx for your answer. I agree with you. Agreed. I will try to figure it out.

      greetings pb1 ;->

Re: storing some values (of an array) to a DB: how does Perl talk to MySQL?
by Marshall (Canon) on Dec 20, 2010 at 23:01 UTC
    I see that your project is getting into a whole new big topic.

    Books I recommend:

    - Learning SQL by Alan Beaulieu
    This book starts from ground zero and uses MySQL for the examples using the mySQL command tool.
    Your project can be done without getting really fancy - read at least up to about page 75. And do the examples as you go! There are some minor mistakes and you will see those as you go along. But this is a great book.

    - Programming the Perl DBI by Alligator Descartes & Tim Bunce.
    This is more general book. Chapter 5, "Interactng with the Database" has some relevant stuff for you.

    Right now you should focus on the first book that I recommended above.

    Before you ask how to make Perl do some SQL thing, you need to understand clearly how you would do that SQL operation yourself from the command line tool!

    Once you understand "Learning SQL", then I think the tutorials will help a lot and you will find that translating a simple SQL statement into the right syntax for the DBI will be a lot, a whole lot easier for you.

    Since you started on the project some months ago, I have seen that you often try to take too big of a step at one time. Like going from "playing with bottle rockets in the backyard" to "let's fly to the moon"!

    Along with that goes trying to program some really, really complex thing in a completely automatic fashion that you can do easily in a few minutes with a little manual guidance. The last code that I sent you was a good example of that principle.

    You will need to draw some diagrams and think a whole lot about your tables structure. These should be created manually from the command line tool (at the MySQL> prompt).

    Your Perl code is going to come down to populating those already existing tables with data. Again before writing program code, you should be able to insert a data record manually from the command line tool.

    So your "homework" is to read at least first part of "Learning SQL". Then make a baby step, by writing some Perl code that does a record insertion like you will already know how to do from the command line tool. Then, do the same thing with some "real data". Pursue your database learning as an independent thing from the web page scraping. Get each major piece working by itself before trying to integrate the whole thing together.

    Oh, another point, inevitably you are going to have a "real DBI" question. When you do, take out all of say the HTML parsing stuff and boil it down to a direct question with as few dependencies as possible. If you have an HTML parsing question, make a direct question about just that part.

      hello Marshall,

      many thanks for the reply. As allways - your ideas were very helpful and supportive!

      Pursue your database learning as an independent thing from the web page scraping. Get each major piece working by itself before trying to integrate the whole thing together.


      right - this is a very good idea!

      i will follow your advices.

      you tipps are great!

      btw: i allready have some books here - among them the book, Programming the Perl DBI from Tim Bunce and by Alligator Descartes


      i will read this and try to digg deeper into the interaction of Perl and MySQL

      thx for advising these literature!!
      some months ago, I have seen that you often try to take too big of a step at one time. Like going from "playing with bottle rockets in the backyard" to "let's fly to the moon"!
      ;-> right said! ;-) thx again...

      have a great time!!

      many greetings
        btw: i allready have some books here - among them the book, Programming the Perl DBI from Tim Bunce and by Alligator Descartes

        That is too detailed to start with, get the "Learning SQL" book I recommended and start there. The first 4 chapters are very important and I have actually worked through all the examples just I recommended for you. I know for sure that it is good because I have done it myself.

        I added a post to this thread, please consider that idea as well. If you can get to the .csv format, importing this into the MySQL DB is like 1/2 page of code.

        Update: I decided to strike-thru the rest of this post. Not because it is "wrong", but rather because the detail doesn't matter at this point and I think it was confusing. I think the OP should start with a single table. That table should be a .CSV file. And let's leave it at that for the moment.

        In DB lingo, what is called "normalization" is a big thing. Instead of a string "Hessen" appearing as a field, this would be essentially a reference to another table that defines the spelling (user presentation of "Hessen"). This saves space and allows the name to be changed. Also allows Hesse and Hessen to be the same thing. The main idea is that everything is defined once. You don't need that - or at least not in any first step.

        I think that a single table will do every thing that you need at the first step. This will be in "un-normalized" form. And that is fine.

        In your application, the space doesn't matter. You may need a hash table conversion from say Bavaria => Bayern prior to putting that name in the DB, if so then just do it. No German is going to be confused by this mapping.

        Keep it simple and just use one term within your DB. In America: TX, Tex, Texas all mean the same thing in terms of postal codes. TX or Texas would be preferred over Tex, but believe me the post office would deliver the mail. I have a module that maps all sorts of state names into official postal codes. There are many thousands of aliases. Some knot-head will put "Texass" and that gets translated to "TX". I am confident that Germany has some standard for postal codes for the Bundeslaender. A very fast "look-up" table. Use that result in your DB.

Re: storing some values (of an array) to a DB: how does Perl talk to MySQL?
by tinita (Parson) on Dec 21, 2010 at 11:12 UTC
Re: storing some values (of an array) to a DB: how does Perl talk to MySQL?
by Marshall (Canon) on Dec 21, 2010 at 01:00 UTC
    I am adding a second reply to the OP because the main point below is completely different than my first post (normally I would just append to my original post). And I think this additional point deserves more focus that just an appended footnote. Not to mention that I am in "wordy typing" mode right now...

    What I am going to suggest here is a bit "sacrilegious" in that I am suggesting that the OP rethink: "i want to store the output in a MySQL-DB", at least as the "next step". For the other readers, I do know something of the application and the data since I wrote part of the code for the Hessen schools. So this isn't a "wild guess".

    I think that there is a step before "make an SQL DB" that makes sense.

    The data set is going to be all schools in each Bundesland within Germany. Hessen for example had 2,040 schools. A Bundesland is like an American state, but there are fewer of them (16 vs 50).

    I would make a single CSV ("Comma Separated Value") file. The first line of this file are the "column headers". I would add "Bundesland" as one of those fields.

    The format is called CSV, but the field separator can be anything that you want - doesn't have to be a comma ','. Some of the DB dumps that I work with use '|', the pipe instead of comma. This avoids the commas within quotes and can make the parsing a lot easier. Although Perl does have a very good csv parser module, if I am the one making the .csv file, I make it as easy as possible. For example, I work with one DB dump that comes a 75MB .zip file and I just de-compress and split each line on '|' - works great.

    The data set is going to be small enough for it all to fit into memory at once even on a "wimpy" machine. A single table instead of 16 tables (per Bundesland) probably will be just fine for this "next step".

    The obvious way to do something useful with this file is to import it into Excel. Excel understands this '|' separated format very well. The Excel version that I have is 10 years old, but even it has some pretty good data table query functions. There used to be an Excel limit of 65K lines per sheet, but that limit is gone now and I don't think this data set will reach that older limit in the first place.

    So, what I am suggesting is another "baby step" that can produce a usable DB tomorrow while the learning about SQL and MySQL is happening over the next weeks or months.

    I am an advocate of "incremental development" especially when there are multiple new technologies to be learned. Get to something usable as fast as possible, use it, learn and then make a new version.